Windows and Linux CPU, Cache and RAM PC BenchmarksRoy Longbottom
Contents
Summary
These benchmarks provide performance measurements over a wide range of data sizes, covering all caches and RAM, using different processing scenarios. In many cases, the programs have been compiled for both 32 bit and 64 bit systems. They emphasise the danger of comparing computer system performance by using a single number.
The latter option was included in the old
Whetstone_Benchmark,
where comparisons of the 9 separate tests from a 2013 3900 MHz Core i7, with a 1992 66 MHz 80486, showed an average performance improvement of 239 times with a range 160 to 336 times (MHz ratio 59x). In the case of these memory tests, average and maximum improvements can be more than 1000 and 3000 times, the additional contributory factors being increased cache sizes and operating speed.
The benchmarks are as follows. In each case performance is measured in MBytes per second:
MemSpeed - carries out three different sets of single and double precision floating point and integer calculations via two data arrays, the Windows version using assembly code instructions. The Linux version uses compiled C code, with a variation in some calculations, enabling 32 bit and 64 bit varieties to be provided. The norm floating point operation for the latter being SSE type SIMD instructions, with up to four simultaneous calculations. This produced a respectable 6 single precision GFLOPS, on a 3.9 GHz CPU, and more than 14 GFLOPS, compiled using the AVX1 directive.
BusSpeed - The benchmark is intended to demonstrate maximum data transfer speeds from buses, caches and RAM, using 32 or 64 bit integer words and data into 64 bit MMX or 128 bit SSE registers. On the latest PCs, use of multiple cores appears to be required, to achieve this goal. Reading starts by reading one word, with a large address increment for the next one, the increment being reduced by a half for following measurements, until all data is read. This identifies where data is read in bursts and provides a means of estimating bus and maximum RAM (or cache) speed. Reading all data is shown to take place at up to nearly 4 MIPS/MHz on the fastest PC tested, where multiple programs also indicated that RAM was working at 85% of specified maximum speed.
RandMem - Serial and random address selections are employed by this benchmark, using the same complex integer based indexing, with read and read/write tests for 32 bit integers and 64 bit floating point numbers. The main purpose is to show the difference between serial and random data transfer speed, where that for the latter is considerably reduced by burst reading or writing, in turn affected by data size. The full example shown shows serial reading at up to 28 times faster than that with random access.
SSEfpu - This carries out floating point calculations, similar to MemSpeed, to compare data transfer speeds, and associated MFLOPS, between two at a time SSE2 double precision, four at a time SSSE2 and single word calculations. GFLOPS obtained by that 3.9 GHz CPU were up to 5.1, 10.2 and 4.9 respectively. A later version for Linux included code that leads to linked multiply and add operation to produce up to eight floating point operations per clock cycle, or 31.2 GFLOPS on the 3.9 GHz CPU, the benchmark demonstrating 25 GFLOPS.
FFT Benchmarks - Three versions were produced, the first being the original C code, the second with further optimised assembly language and the third using SSE SIMD instructions. The benchmarks run code for single and double precision Fast Fourier Transforms of size 1024 to 1048576 (1K to 1024K), each one being run a number of times to identify variance, with results in milliseconds. The latest replaces the last two with an extensively modified C program. Memory used varies between 16 KB and 52 MB. The programs use skipped sequential memory access, making them susceptible to burst data transfer degradation. Reiterating earlier Core i7 performance advantage over 80486, the second version provided gains between 939 and 1321 times.
Note - This document was converted by Winnovative Free HTML to PDF Converter to include in my ResearchGate material.
Go To Start
MemSpeed Benchmark
MemSpd2K is a full Windows benchmark that employs three different sequences of operations, on 64 bit double precision floating point numbers, 32 bit single precision numbers and 32 bit integers, via two data arrays:
Sum to register r = r + x [m] * y[m] (Integer + y [m])
Sum to memory x[m] = x[m] + y[m]
Memory to memory x[m] = y[m]
These are executed from assembly code which uses the same instructions as the original command line driven MemSpeed benchmark. The memory loading speed is calculated in terms of millions of bytes per second (MB/S). Measurements are made at 4000, 8000, 1600 etc. memory bytes up to 25% of the main RAM size to produce speed ratings via data from different levels of cache and from RAM.
A pre-compiled version of the benchmark can be found in
MemSpd2K.zip
which also contains the source code, providing further explanatory comments.
MemSpeed can be found in
DOSTests.zip
- file MDTRDOS.exe.
The benchmark has also been run on other platforms. Results are available from the following -
Android,
Raspberry Pi
and
PC Linux.
The following is an example results log file. Conversion factors for MFLOPS and Integer MIPS are shown at the bottom. For floating point, double precision and single precision arithmetic speeds tend to be the same, unless limited by memory speed, and this is not the general case here.
The complex instruction set used for assembly includes such as adding to registers directly from memory, rather than separate load and add instructions. This reduces the instruction count, providing more MIPS per MegaByte of data transferred.
Core i7 4820K mainly running at 3.9 GHz using Turbo Boost
1600 MHz RAM over 4 channels, Windows 8.1
Memory s=s+x[m]*y[m] Int+ x[m]=x[m]+y[m] x[m]=y[m]
KBytes Dble Sngl Int Dble Sngl Int Dble Sngl Int
Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S
L1 4 20385 10312 15486 27497 13882 20125 24835 12456 12347
8 20757 10406 15540 27765 13895 20401 24914 12487 12478
16 20805 10419 15622 27649 13900 20376 25014 12502 12507
32 20837 10409 15620 27787 13823 20378 24112 12498 12478
L2 64 20846 10427 15623 27493 13888 20145 22644 12260 12243
128 20852 10379 15558 27515 13897 20187 23479 12405 12211
256 20829 10380 15627 27003 13866 19975 21146 12058 11995
L3 512 20826 10424 15593 24664 13885 18948 14964 10811 10615
1024 20750 10423 15630 24762 13824 18879 15122 10721 10625
2048 20754 10423 15633 24771 13887 18911 14962 10781 10568
4096 20481 10330 15317 23879 13497 18610 14938 10672 10532
8192 19555 10025 14849 22361 13060 17826 13360 10250 10170
RAM 16384 16340 9264 13242 14314 11028 13304 7333 7599 7581
32768 15927 9333 12964 13882 10785 12863 6995 7365 7343
65536 16036 9226 13031 13939 10932 12893 7006 7388 7380
131072 16251 9371 13134 13946 10989 13076 7089 7405 7375
262144 16287 9391 13067 13971 10984 12991 7100 7368 7363
524288 15998 9335 12869 13904 10959 12956 7060 7294 7353
1048576 16357 9386 13032 13957 10981 12999 7081 7376 7391
Max 20852 10427 15633 27787 13900 20401 25014 12502 12507
FP Divide by 8 4 16 8
MFLOPS 2607 2607 1737 1738
Int Divide by 2.91 2.29 1.45
MIPS 5372 8909 8626
Maximum RAM speed 800 MHz x 2 DDR x 8 bus width x 4 channels = 51.2 GB/sec
Multiple cores need to be used for a higher throughput from RAM
|
Go To Start
Windows MemSpeed Results
Results below are a selection from those in
memspd2k results.htm.
These include MemSpeed results for early PCs, where, as demonstrated by those for 100 MHz Pentium, are normally very similar to the later benchmark. The exception was the Pentium 4 (see Slow below),
where speeds can be slower on reading data in caches than that from main memory.
In this case, two arrays are allocated with addresses in multiples of 2048 bytes apart and this appears to identify a design limitation with the Intel P4 CPU (and the version of Windows?), where inappropriate cache flushing is applied. This problem appears to have been rectified on later P4 CPUs (see P4E), but
SSE3DNow Benchmark
results are the preferred option, as it uses the same calculations to measure performance.
Below are separate speeds for data in L1 cache, L2 cache, L3 cache and RAM, for PCs over 22 years from 1991.
MemSpd2K L1 Cache Results in MBytes/Second
s=s+x[m]*y[m] x[m]=x[m]+y[m] x[m]=y[m]
CPU MHz Dble Sngl Int Dble Sngl Int Dble Sngl Int
AMD 80386 Not2K 40 7 4 16 5 3 11 4 2 9
80486 Not2K 66 37 20 71 34 18 64 29 16 30
Pentium Not2K 100 267 148 170 313 162 197 105 53 53
Pentium 100 220 122 145 296 149 156 113 53 52
Pentium Pro 200 892 482 559 896 487 697 782 394 350
Pentium MMX 200 577 303 374 667 335 424 355 153 172
Celeron A 300 1340 725 861 1348 731 1031 1172 590 526
Celeron 2 600 2704 1455 1734 2714 1467 2074 2366 1186 1058
Pentium II 450 2025 1049 1298 2039 1099 1184 1760 892 794
Pentium III 450 1954 1066 1258 1969 1073 1536 1720 862 768
Pentium IIIE 600 2688 1457 1550 2701 1463 2070 2354 1185 1053
Pentium IIIEB 800 3598 1950 2311 3610 1959 2763 3150 1587 1410
PIII Tualatin 1266 6102 2865 4454 6164 2875 4410 5675 2863 2502
Celeron M 1295 6524 3300 4495 8943 4425 5087 6711 3329 3395
Pentium M 1862 9691 4671 6505 12896 6495 6893 9230 4397 4884
Pentium 4 Not2K 1900 5689 2852 6320 9433 4000 5125 4769 2627 3466
Pentium 4 Slow 1900 1740 344 159 2657 1523 1292 4138 1803 1547
Pentium 4N Slow 2533 2350 451 222 3919 2060 1736 5353 2214 1767
Pentium 4N 2533 6490 2989 1761 6716 2527 2360 5286 2075 1728
Pentium 4E 3000 7830 3885 11355 13096 5704 6472 8560 5469 5737
Atom M 1600 3337 1776 4577 1869 941 4188 1322 669 2094
Core 2 Duo M 1830 9210 4687 6396 12591 5301 7036 11307 3597 5561
Celeron C2 M 2000 10405 5002 7373 13858 6202 7322 12357 4005 6198
Core 2 Duo 1 CP 2400 12556 6122 8921 16749 7683 9510 14924 4667 7536
Core i5 2467M 2467 11480 5804 8847 15882 7783 11275 13364 6873 7021
Core i7 4820K 3900 20757 10406 15540 27765 13895 20401 24914 12487 12478
Cyrix M300 225 242 143 296 260 130 284 215 108 148
AMD K62 450 977 524 1573 790 395 1419 588 323 1156
AMD K63 400 864 469 1399 700 354 1267 451 233 340
Duron 700 2733 1379 2756 4513 2441 2958 3193 1573 1993
Athlon 550 2145 1084 1873 3637 1902 2310 2494 1234 1549
Athlon Tbird 1000 3913 1980 3952 6767 3507 4243 4575 2244 2841
Athlon 4 1533 5916 3036 6065 10590 5382 6937 7302 3733 4327
Ath4 Barton 1800 6779 3538 7054 12488 6252 8087 8506 4345 5111
Turion 64 M 1900 7325 3736 7513 14610 6430 8063 9224 4598 4879
Athlon XP 2080 8148 4114 8229 14479 7313 9412 9892 5057 5950
Opteron 2000 7869 3941 7892 15565 7005 8430 9884 4969 5911
Athlon 64 2210 8656 4379 8601 17241 7771 9207 10801 5525 6636
Phenom II 3000 11821 5930 11931 21669 11792 13040 14829 7317 8731
|
Go To Start
MemSpd2K L2 and L3 Cache Results in MBytes/Second
s=s+x[m]*y[m] x[m]=x[m]+y[m] x[m]=y[m]
CPU MHz Dble Sngl Int Dble Sngl Int Dble Sngl Int
80486 Not2K 66 25 15 29 20 13 27 14 10 18
Pentium Not2K 100 121 89 100 93 74 82 60 37 43
Pentium 100 105 76 87 111 85 84 94 46 46
Pentium Pro 200 667 436 553 377 346 325 286 240 229
Pentium MMX 200 235 170 202 158 143 158 101 68 73
Celeron A 300 909 620 756 747 560 649 402 362 324
Celeron 2 600 2784 1359 1727 2143 1067 1388 1312 929 942
Pentium II 450 1188 656 715 525 393 521 275 241 220
Pentium III 450 1229 657 733 532 434 561 292 251 285
Pentium IIIE 600 2406 1315 1645 2127 1184 1380 1154 919 887
Pentium IIIEB 800 3710 1821 2317 2870 1449 1857 1747 1300 1258
Pentium IIIEB 1000 4626 2267 2888 3568 1815 2309 2170 1623 1532
PIII Tualatin 1266 5743 2935 3505 5073 2452 2939 2869 2034 1966
Celeron M 1295 6462 3333 3543 4760 3432 3234 3427 2556 2450
Pentium M 1862 9278 4792 5127 6777 4935 4694 4272 3541 3702
Pentium 4 Not2K 1900 5896 2865 3712 7529 3523 4650 3893 2151 2942
Pentium 4 Slow 1900 1719 1022 90 2389 1261 1170 3153 1554 1267
Pentium 4N Slow 2533 2034 1669 125 3537 1577 1536 5461 1805 1838
Pentium 4N 2533 6381 2935 1764 5900 2365 2326 5345 2067 1643
Pentium 4E 3000 7644 3856 4334 8084 4734 6581 6336 4062 4527
Atom M 1600 2651 1585 3301 1805 914 2972 1338 669 1437
Core 2 Duo M 1830 9357 4725 6168 8651 5609 5872 5943 3760 3807
Celeron C2 M 2000 10581 5289 6996 9569 6291 6564 6529 3905 3799
Core 2 Duo 1 CP 2400 12755 6380 8463 11561 7578 7928 7798 5328 5349
Core i5 2467M 2467 11709 5977 8932 15714 7518 11419 13219 6796 7062
Core i7 4820K 3900 20852 10379 15558 27515 13897 20187 23479 12405 12211
Cyrix M300 225 175 115 208 172 104 173 110 90 98
AMD K62 450 434 313 465 292 216 307 175 172 172
AMD K63 400 674 364 747 539 305 702 424 227 317
Duron 700 1477 1073 1007 1373 806 901 947 637 570
Athlon 550 772 640 639 693 469 559 447 378 345
Athlon Tbird 1000 2636 1792 1661 2089 1237 1373 1484 974 876
Athlon 4 1533 3565 2685 2609 3119 1866 2099 2102 1539 1349
Ath4 Barton 1800 3985 3068 2958 3563 2202 2532 2439 1849 1569
Turion 64 M 1900 4603 3554 3595 3601 2088 3139 2625 1807 1714
Athlon XP 2080 4663 3567 3444 4148 2614 2940 2840 2151 1823
Dual Opteron 2000 5102 3940 4089 3930 2252 2402 3305 2244 2197
Athlon 64 2210 4322 4388 4661 4883 2734 3921 3789 2507 2487
Phenom II 3000 11839 6017 11581 14976 10128 10365 8189 6680 6368
L3 Cache
Phenom II 3000 8530 5808 7261 8091 6890 7350 4355 3787 3807
Core i5 2467M 2300 11759 5910 8748 14039 7464 10547 9051 6226 6195
Core i7 1 CP 3060 15391 7801 10808 11034 5524 10451 9204 5814 6495
Core i7 3820 &&&& 19730 9980 14787 22746 13090 17301 14401 10074 9885
Core i7 4820K 3900 20481 10330 15317 23879 13497 18610 14938 10672 10532
|
Go To Start
MemSpd2K RAM Speed Results in MBytes/Second
s=s+x[m]*y[m] x[m]=x[m]+y[m] x[m]=y[m]
CPU MHz Dble Sngl Int Dble Sngl Int Dble Sngl Int
AMD 80386 Not2K 40 6 4 11 4 3 8 4 2 7
80486 Not2K 66 16 12 18 11 9 12 8 7 8
Pentium Not2K 100 59 50 54 42 39 41 30 21 22
Pentium 100 60 49 54 50 44 45 41 26 27
Pentium Pro P0 200 138 134 138 100 85 89 49 51 49
Pentium MMX P0 200 130 107 118 99 81 84 74 56 59
Celeron A P0 300 347 195 230 189 133 142 96 95 95
Celeron 2 P0 600 418 239 309 255 163 166 137 123 127
Pentium II P1 450 492 253 305 270 171 187 142 142 142
Pentium III P1 450 503 235 335 300 199 198 161 160 163
Pentium IIIE P1 600 404 305 308 241 152 161 153 124 127
Pentium IIIEB P2 800 771 434 551 313 224 222 157 152 152
PIII Tualatin P2 1266 663 630 630 370 368 364 185 188 186
Celeron M 1295 1431 1340 1349 868 814 809 447 446 437
Pentium M DC1 1862 2473 2682 2770 1369 1373 1370 711 697 693
Pentium 4 Not2K P2 1900 843 839 832 544 552 551 273 277 277
Pentium 4 P2 1900 822 662 301 578 511 567 295 298 297
Pentium 4N R2 2533 3052 2407 1650 1643 1582 1566 872 855 847
Pentium 4N DC1 2533 2304 2001 1632 1339 1280 1271 683 673 666
Pentium 4E DC2 3000 3430 3052 3115 2323 2310 2219 1141 1123 1135
Atom M DCC 1600 2334 1476 2827 1568 884 1693 900 653 901
Core 2 Duo M DC4 1830 3939 3794 3924 2476 2373 2398 1257 1206 1171
Celeron C2 M DC3 2000 3129 3010 2786 1898 1905 1872 946 953 959
Core 2 Duo * DC3 2400 4816 4761 4828 3068 3085 3081 1568 1539 1547
Cor i5 2467M DC7 2467 9970 5532 7847 9589 6939 8323 5111 4682 4640
Core i7 4820 DC8 4c 3900 16287 9391 13067 13971 10984 12991 7100 7368 7363
Cyrix M300 225 100 78 112 83 65 84 45 42 43
AMD K62 P1 450 219 191 217 142 123 138 74 73 72
AMD K63 P1 400 165 157 171 108 98 110 84 50 50
Duron P2 700 413 264 265 270 217 229 252 171 163
Athlon P1 550 296 260 259 247 211 223 186 140 140
Athlon Tbird P2 1000 313 293 292 303 235 281 238 177 170
Athlon 4 D1 1533 790 764 756 766 674 719 422 393 367
Ath4 Barton #D1 1800 569 562 557 397 397 398 200 198 187
Turion 64 M DC3 1900 2517 2453 2566 2078 1808 1781 1147 1101 1008
Athlon XP D2 2080 1221 1218 1169 960 896 927 523 487 456
Opteron D3 2000 2338 2347 2360 2182 1818 1783 1171 1089 1097
Athlon 64 DC2 2210 3023 2962 2942 2004 1952 1975 1076 1047 980
Phenom II DC7 3000 4993 4146 4240 4285 3890 4215 2323 2100 2087
Key P1 100 MHz P2 133 MHz
D1 DDR 133 MHz D2 DDR 166 MHz
D3 DDR 200 MHz DC1 Dual Channel DDR 133 MHz
DC2 Dual Channel DDR 200 MHz DC3 DDR2 533 MHz
DC4 DDR2 666 MHz DC5 DDR2 800 MHz
DC6 DDR3 1066 MHz DCC DDR2 533 MHz 1 channel
DC7 DDR3 1333 MHz DC8 DDR3 1600 MHz 1 2 4 channels
R1 RDRAM 400 MHz R2 RDRAM 533 MHz
# Slow speed examples (Ath4 slow chipset, Core 2 Duo slow nForce 570 chipset)
* Core 2 Duo Intel 965 chipset M = Mobile
|
Go To Start
BusSpeed Benchmark
BusSpd2K benchmark is intended to demonstrate maximum data transfer rates from caches and RAM using 32 bit integer words and 64 bit MMX words. MOV and AND assembly code instructions are used, with 64 instructions in the inner loops for integers and 512 instructions for MMX. The program measures speeds with data size 4, 8, 16, 32 etc. KBytes up to a maximum of 50% RAM size. Results are given in MBytes/second (MB/s), where M = 1,000,000. An approximation of processor execution speed in Millions of Instructions Per Second (MIPS) can be obtained by dividing MB/s for integer tests by 4 and those for MMX tests by 8.
Ten different tests load data to one CPU register or 2 registers alternately (MMX 1 or 8). Tests 5 and 6 use MOV instructions to 1 and 2 integer registers, with tests 7 and 8 the same except using AND. These identify differences between CPU models. Tests 9 and 10 use MMX MOV to 1 and 8 registers, normally demonstrating maximum data transfer speeds.
Tests 1 to 4 load a 32 bit word (4 bytes) with address increments of 64, 32, 16, 8 bytes respectively. These are intended to demonstrate bus operation and speed where data is transferred in bursts.
A pre-compiled version of the benchmark can be found in
busspd2k.zip
which also contains the source code, providing further explanatory comments.
The benchmark has also been run on other platforms. Results are available from the following -
Android,
and
Raspberry Pi,
already available at ResearchGate. Later, the intention is to upload further reports for Linux, multi-core and stress testing versions. A summary of further details can be found in
busspd2k results.htm.
The following represents the best performance that could be expected on a May 2014 desktop, assuming no overclocking.
Following are some single thread results from the later multithreading version, that has different memory address increments and includes SSE2 functions, instead of MMX. There is also a 64 bit compilation that also uses 64 bit integers. Here, measured MB/second can be twice as high as the 32 bit program, implying the same execution time using larger registers.
The later version also comprises all compiled C code, using long sequences of AND functions.
Core i7 4820K mainly running at 3.9 GHz using Turbo Boost
32 GB 1600 MHz RAM over 4 channels, Windows 8.1
MovI MovI MovI MovI MovI MovI AndI AndI MovM MovM
Memory Reg2 Reg2 Reg2 Reg2 Reg1 Reg2 Reg1 Reg2 Reg1 Reg8
KBytes Inc64 Inc32 Inc16 Inc8 Inc4 Inc4 Inc4 Inc4 Inc8 Inc8
Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S
L1 4 14612 26366 28608 29964 30626 30592 15551 29027 60254 60322
8 15197 28712 29947 30605 30976 30982 15600 29350 61449 61471
16 15295 29857 30563 30945 31100 31108 15611 29461 61980 61989
32 14584 24315 28156 29773 30122 30111 15602 29073 55568 55455
L2 64 7032 12156 17523 22653 26520 26556 15624 26426 29089 29155
128 7638 12224 17650 22580 26579 26564 15635 26468 29210 29137
256 6983 11598 17081 22241 26317 26326 15614 26259 28745 28769
L3 512 2797 5461 9755 16920 24858 24859 15631 24818 26636 26638
1024 2744 5378 9685 16783 24756 24730 15622 24729 26514 26529
2048 2747 5371 9676 16744 24722 24705 15624 24621 26457 26471
4096 2739 5346 9636 16370 24336 24341 15633 24289 25899 25864
8192 2365 4557 8479 14462 21502 21522 15083 21488 23089 23097
R 16384 969 2113 4167 8377 13906 13890 13099 13922 14502 14519
32768 928 2003 3887 8154 13591 13587 13046 13593 14045 14045
65536 931 2011 3905 8206 13639 13624 13076 13632 14075 14095
131072 944 2055 3914 8276 13672 13670 13138 13701 14141 14146
262144 945 2055 3920 8305 13709 13686 13110 13701 14136 14151
524288 933 2024 3918 8225 13666 13657 13101 13648 14107 14117
1048576 945 2059 3919 8276 13681 13670 13124 13696 14132 14137
R = RAM
Maximum speed 800 MHz x 2 DDR x 8 bus width x 4 channels = 51.2 GB/sec
Multiple cores need to be used for a higher throughput from RAM
Later Multithreading Version, Single Thread Results
Inc Inc Inc Inc Inc Read 128b
32wds 16wds 8wds 4wds 2wds All SSE2
32 Bit
L1 15642 15642 22493 21590 21709 21375 61610
L2 2782 2904 5623 9806 17348 20363 40673
RAM 644 934 1994 3842 8098 13852 15963
64 Bit ##
L1 31565 31291 31178 42042 42508 41978 61606
L2 5375 5559 5793 11083 20009 34332 40516
RAM 1034 1272 1866 4023 7724 16029 15980
## 64 bit wds
Example 16 32b words = Inc64B and 8 64b words = Inc64B
|
To Start
Windows Bus Speed Results
On loading registers with varying address increments, the size of a burst of data over a bus can be recognised as the point when data transfer speed becomes constant,
for example, 32 bytes (8 words) on the Celeron A below, and 64 bytes (16 words) on the others. Maximum possible bus burst data transfer speed can be estimated from these, as 62 x 8 MB/second for the Celeron and 62 x 16 for the one below.
Then, the multithreading results above suggest even larger bursts, particularly using 64 bit words.
Theoretical maximum data transfer speeds, for more modern PCs, are calculated as 8 (bus width) x bus MHz x 2 (Double Data Rate) x number of channels, 8 x 800 x 2 x 4 = 51.2 GB/second for the Core i7, then 667 MHz and 2 channels for the Phenom (666.6 x 32) and 400 MHz and 2 channels (400 x 32) for the Core 2 Duo.
With the 8 byte wide bus, 8 data transfers are required for a 64 byte burst, or 4 clock pulses using DDR. Then there is confusion regarding data transfer startup time, CAS latency, that is 9 clocks for both the Core i7 and Phenom RAM. However, this can be overlapped with continuous data transfers. The latter is influenced by how fast the CPU can handle the data and it is clear that multiple cores might be required.
Examples of multi core use are below. Multithreading has its own inherent overheads, demonstrating 71% efficiency on burst reading and 61% ANDing all data on the Phenom, with Core i7 some 10% better. The 4 separate programs on the Core i7 are shown to achieve 85% of the specified maximum speed.
Single CPU Core Tests
MovI MovI MovI MovI MovI MovI AndI AndI MovM MovM
CPU Max Max Reg2 Reg2 Reg2 Reg2 Reg1 Reg2 Reg1 Reg2 Reg1 Reg8
Bus Burst Inc64 Inc32 Inc16 Inc8 Inc4 Inc4 Inc4 Inc4 Inc8 Inc8
MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S
Celeron A 800 496 62 62 122 246 406 408 425 427 494 492
Duron 1067 992 62 123 244 496 682 681 515 515 947 946
Pentium 4 1067 1008 63 128 259 500 969 942 954 969 997 1000
Core 2 Duo 12800 6224 389 794 1448 2691 5020 5023 4884 4813 5617 5657
Phenom II 21333 7392 462 901 1771 3443 5380 5355 5225 5240 6934 6936
Core i7 51200 15120 945 2059 3919 8276 13681 13670 13124 13696 14132 14137
Multiple Core Tests
Phenom 4 Threads 15152 13000
Corei7 4 Threads 42112 35915
Corei7 4 programs 43547
|
To Start
BusSpd2K L1 Cache Results in MBytes/Second
MovI MovI MovI MovI MovI MovI AndI AndI MovM MovM
Reg2 Reg2 Reg2 Reg2 Reg1 Reg2 Reg1 Reg2 Reg1 Reg8
MHz Inc64 Inc32 Inc16 Inc8 Inc4 Inc4 Inc4 Inc4 Inc8 Inc8
80486 DX2 66 112 117 119 124 136 122 120 123 0 0
Pentium 100 316 355 637 679 385 713 195 380 0 0
Pentium Pro 200 679 748 769 764 775 779 758 756 0 0
Pentium MMX 200 699 735 1428 1444 776 1470 393 764 1568 1567
Celeron A 450 1617 1690 1729 1677 1724 1737 1703 1700 3517 3507
Pentium II 450 1515 1649 1745 1728 1763 1765 1710 1717 3520 3527
Pentium III 450 1624 1700 1738 1735 1759 1740 1742 1744 3513 3491
AMD K62 500 1633 1742 1685 1706 1783 1780 1725 1755 3593 3509
Celeron 2 566 2042 2136 2198 2184 2213 2210 2191 2194 4446 4445
Duron 700 2530 4807 5011 4985 4941 5034 2677 4935 10169 10151
Pentium IIIE 733 2652 2778 2852 2833 2849 2869 2844 2847 5768 5765
Athlon 800 2917 5508 5754 5735 5791 5909 3135 5777 11951 11888
Athlon Tbird 1000 3635 6738 6331 7084 7245 7391 3830 7224 14871 14808
PIII Tualatin 1266 4588 4802 4914 4862 4949 4967 4806 4804 9898 9897
Atom M 1600 5447 5628 5792 5901 5988 5973 5984 5970 12268 12480
Athlon 4 1533 5594 10379 11155 11496 11161 11173 6016 11122 22870 22845
Pentium 4 1700 6139 6343 6559 6639 6589 6540 6405 6428 13188 13276
Ath4 Barton 1800 6525 12165 12999 13367 13020 12917 7013 11689 26630 26632
Core 2 Duo M 1830 6700 6879 7061 7200 7251 7215 7214 7249 14461 14448
Pentium M 1862 6744 6875 7117 7320 7371 7381 7255 7374 14424 14658
Turion 64 1900 6872 13397 14159 14798 14190 14094 7294 14098 29407 29390
Opteron 2000 7237 14069 14802 15473 14822 15008 7783 14726 30715 30698
Celeron C2 M 2000 7362 7622 7792 7845 7877 7800 7597 7911 15124 15648
Athlon XP 2080 7585 14104 15097 15617 15011 14995 7982 15009 31009 30959
P4 Xeon 2200 7947 8301 8495 8593 8559 8561 8336 8414 17176 17184
Athlon 64 2210 8070 15711 16498 17247 16538 16763 8670 16454 34291 34254
Core i5 2467M 2300 8166 15146 17474 17504 17348 18026 8298 17087 35822 35258
Core 2 Duo 2400 8640 8820 9339 9451 9530 9530 9477 9523 18930 18909
Pentium 4E HT 3000 9686 11043 11233 11525 11562 11227 11054 11099 22804 22657
Pentium 4 3000 10915 11458 11710 11853 11784 11790 11426 11238 23589 23500
Phenom II 3000 22764 22849 23433 23768 23938 23934 12019 22553 46887 46911
Core i7 930 3066 11251 11488 11620 11614 11712 11719 5873 11718 23391 23398
Core i7 860 3466 12977 13465 13645 11701 13556 13349 6794 13742 27450 26951
Pentium 4 3678 13412 13879 14306 14358 14252 14422 13007 13473 28713 28818
Core i7 4820K 3900 15197 28712 29947 30605 30976 30982 15600 29350 61449 61471
|
Go To Start
BusSpd2K L2 and L3 Cache Results in MBytes/Second
MovI MovI MovI MovI MovI MovI AndI AndI MovM MovM
Reg2 Reg2 Reg2 Reg2 Reg1 Reg2 Reg1 Reg2 Reg1 Reg8
MHz Inc64 Inc32 Inc16 Inc8 Inc4 Inc4 Inc4 Inc4 Inc8 Inc8
80486 DX2 66 11 11 11 17 32 31 30 30 0 0
Pentium 100 26 26 40 75 124 139 96 117 0 0
Pentium Pro 200 133 132 234 317 488 487 454 453 0 0
Pentium MMX 200 53 53 75 131 235 235 192 232 264 264
Celeron A 450 306 305 548 793 975 975 974 976 1582 1619
Pentium II 450 179 179 359 709 829 824 831 832 1428 1433
Pentium III 450 180 180 359 531 846 846 843 846 1430 1437
AMD K62 500 29 59 117 218 436 436 429 429 436 436
Celeron 2 566 532 533 1125 1205 1392 1392 1389 1392 2410 2409
Duron 700 134 270 535 1029 1932 1955 1577 1533 2050 2008
Pentium IIIE 733 697 697 1466 1568 1805 1808 1809 1809 3135 3131
Athlon 800 106 211 424 846 1697 1698 1599 1588 1693 1698
Athlon Tbird 1000 198 360 788 1584 3144 3169 2572 2469 3104 3146
PIII Tualatin 1266 1701 1575 2513 2520 2882 2864 2879 2881 5038 5034
Atom M 1600 379 739 1385 2412 3624 3690 3683 3681 4769 4718
Pentium 4 1700 2617 3077 3544 3570 4658 4656 4598 4628 7143 7117
Ath4 Barton 1800 355 713 1421 2799 4863 4851 4009 4462 5682 5622
Core 2 Duo M 1830 1597 2523 3475 5130 6227 6234 6233 6012 7950 7976
Pentium M 1862 1214 2117 3289 4031 4731 4668 4732 4749 8077 8109
Turion 64 1900 429 831 1688 2976 5490 5383 5467 5457 5939 6086
Opteron 2000 670 1296 2588 4700 7167 7163 5870 6201 9480 9542
Celeron C2 M 2000 1791 2765 3799 5516 6812 6816 6812 6805 8747 8557
Athlon XP 2080 413 828 1645 3270 5637 5601 4597 5163 6564 6567
P4 Xeon 2200 4190 4021 4577 4630 6038 6038 6010 6020 9255 9258
Athlon 64 2210 651 1285 2411 4418 7786 7776 6448 6688 8936 8718
Core i5 2467M 2300 4196 6815 9583 12742 12941 13572 8563 14720 14156 17136
Core 2 Duo 2400 2131 3257 4597 6772 8187 8196 8168 8201 10549 10559
Pentium 4E HT 3000 2945 5640 6105 6624 7526 7536 7425 7470 13097 13303
Pentium 4 3000 5912 5521 6335 6385 8338 8337 8298 8322 12762 12779
Phenom II 3000 1500 2995 5986 11360 15036 15036 11918 15233 22377 22367
Core i7 930 3066 3213 4805 7305 9467 10811 10810 5875 10805 14442 14408
Core i7 860 3466 3595 5003 8442 11028 12618 12639 6895 12408 16719 16788
Pentium 4 3678 7258 6719 7722 7808 10161 10201 10169 10064 15423 15560
Core i7 4820K 3900 7638 12224 17650 22580 26579 26564 15635 26468 29210 29137
L3 Cache
Core i5 2467M 2300 1807 3499 5553 9167 14017 14395 9017 13494 15050 14363
Phenom II 3000 745 1485 2974 5881 9833 9825 9615 9603 11726 11650
Core i7 930 3066 2004 3497 5958 9088 10447 10448 5870 10447 13857 13857
Core i7 860 3466 2262 3537 6992 10641 12204 12233 6319 10478 15059 16251
Core i7 4820K 3900 2744 5378 9685 16783 24756 24730 15622 24729 26514 26529
|
Go To Start
BusSpd2K RAM Results in MBytes/Second
Max Max MovI MovI AndI AndI MMX
System MHz bus Burst Reg1 Reg2 Reg1 Reg2 Max
80486 DX2 B 66 133 32 25 24 23 24 0
Pentium B 100 400 96 73 79 64 73 0
Celeron 2 # 900 800 168 166 166 166 165 166
Pentium MMX B 200 533 232 140 140 123 139 140
Pentium Pro 200 533 256 225 225 240 240 0
AMD K6 B 550 800 272 238 238 237 238 238
Pentium IIIEB # 1000 1067 289 289 289 289 289 289
Celeron A 300 533 456 267 267 282 280 450
Celeron A 450 800 496 407 406 426 427 494
Pentium II H 400 800 488 314 314 322 322 484
Pentium II H 450 800 504 317 316 324 325 500
Celeron 2 600 533 504 324 326 343 343 511
Pentium III H 450 800 528 303 304 339 334 527
Ath4 Barton # 1800 2133 592 589 590 433 492 594
Athlon Tbird # 1200 1067 672 528 527 351 328 670
Athlon H 800 800 672 575 575 414 366 673
Pentium IIIE 800 800 752 463 462 477 476 764
Celeron 2 850 800 784 474 474 486 486 765
Athlon H 900 1067 912 648 648 461 416 879
PIII Tualatin 1266 1067 912 580 579 580 575 749
Duron 700 1067 994 682 685 512 516 977
Pentium IIIEB R 1000 1600 1024 411 412 420 420 794
Pentium 4 2400 1067 1027 987 989 982 990 1010
Pentium IIIEB 1000 1067 1035 509 516 537 537 908
Athlon Tbird 800 1067 1040 677 677 516 510 942
Athlon Tbird 950 1067 1040 680 680 463 417 950
Duron 1000 1067 1043 680 680 463 414 951
Pentium 4 1900 1067 1043 981 980 979 967 1007
Athlon Tbird D 1466 2133 1744 755 756 666 666 1217
Pentium 4 D 1800 2133 1952 1455 1455 1401 1415 1641
Athlon Tbird D 1333 2133 1968 756 756 659 657 1219
Pentium 4 D 3066 2133 2021 1826 1819 1812 1818 1913
Athlon 4 D 1725 2400 2032 888 878 668 745 1172
Athlon XP D 2080 2667 2336 1171 1167 903 986 1549
Pentium 4 R 1700 3200 2336 1478 1471 1402 1429 1660
P4 Xeon R 2200 3200 2448 1537 1538 1511 1515 1822
Athlon 64 D 2000 3200 2932 2778 2736 2669 2663 2963
Opteron D 2000 3200 3136 2123 2129 2070 2110 2476
Pentium 4 R 2533 4267 3216 2078 2100 2075 2084 2358
Atom M D2 1600 6400 3280 3011 2958 2998 2953 3250
Pentium M DC 1862 4267 3328 2379 2375 2258 2294 2545
Core 2 Duo a DC2 2400 8533 3456 4312 4314 4194 4342 4860
Pentium 4 DC 2533 4267 3529 2576 2578 2451 2448 2742
Celeron C2 M DC2 2000 8533 3632 2550 2843 2607 3351 3493
Turion 64 M DC2 1900 8533 4112 2513 2555 2430 2484 2689
Core 2 Duo M DC2 1830 10667 4800 3738 3758 3604 3643 4464
Pentium 4E DC 3000 6400 4976 3613 3623 3432 3564 3895
Athlon 64 DC 2210 6400 4992 2793 2791 2704 2803 2941
Pentium 4 DC 3678 6272 5021 3375 3381 3249 3273 3723
Core 2 Duo b DC2 2400 8533 5376 4435 4402 4413 4342 5161
Core 2 Duo c DC2 2400 12800 6272 5051 5061 4961 4893 5720
Phenom II DC32 3000 21333 7208 5397 5393 5263 5262 6950
Core i7 DC32 3066 17067 11264 7845 7840 5410 7853 8290
Core i5 2467M DC3 2300 21333 12608 10245 9632 6570 9481 10258
Core i7 DC32 3466 21333 13600 9095 9204 6275 9421 9794
Core i7 4820K QC34 3900 51200 16472 13681 13670 13124 13696 14137
Key B L2 cache on memory bus # Example of poor results
H L2 at half CPU MHz or less R RDRAM
D DDR RAM DC Dual Channel DDR RAM
DC2 DDR 2 DC32 DDR 3 2 Channel
M Mobile CPU QC34 DDR 3 4 Channel
|
Go To Start
RandMem Benchmark
RandMem benchmark carries out eight tests at increasing data sizes to produce data transfer rates in MBytes Per Second from caches and memory. Serial and random address selections are employed, using the same program structure, with read and read/write tests for 32 bit integers and 64 bit floating point numbers.
The C/C++ program structure is as follows with array xi indexing via sequential or random numbers stored in the array.
Read - toti = toti & xi[xi[i+0]] | xi[xi[i+2] & xi[xi[i+4]] |& to i+30
Read/write - xi[xi[i+2]] = xi[xi[i+0]]; repeated to i+30 and i+28
The main purpose is to demonstrate performance differences between sequential and random access when using the same CPU instructions, particularly the impact of burst reading (and writing) over a bus. In this case, with random access, 32 bytes or more will be read when only four are requested.
(see also
BusSpeed Benchmark}.
Random speeds are also affected by lower level cache sizes.
A precompiled version of the benchmark can be found in
randmem.zip
which also contains the source code, providing further explanatory comments. Information on maximum speeds when different processing is involved can be obtained from
BusSpeed Benchmark Results
and
SSEfpu Benchmark Results.
Then
randmem results.htm
includes further details and comparisons, including those for multithreaded benchmark versions.
Below is an example of MB/second results from a Core i7 CPU, showing the effects of different cache sizes. Note decrease in random access speeds, due to burst reading and reducing benefits of caching.
Core i7 4820K mainly running at 3.9 GHz using Turbo Boost
32 GB 1600 MHz RAM over 4 channels, Windows 8.1
Integer....................... Double/Integer................
Serial........ Random........ Serial........ Random........
RAM Read Rd/Wrt Read Rd/Wrt Read Rd/Wrt Read Rd/Wrt
KB MB/Sec MB/Sec MB/Sec MB/Sec MB/Sec MB/Sec MB/Sec MB/Sec
L1 6 24753 21240 24353 20950 27914 26690 27901 26866
12 24674 21377 24041 20986 28277 24369 28276 27232
24 24599 21373 24361 21586 28457 24246 28440 25932
L2 48 22414 20560 18133 12948 28389 24984 28045 22632
96 22465 20538 13834 8952 28354 24827 22114 13686
192 22480 20579 11814 7779 28353 24880 18659 12085
L3 384 21765 17461 7988 5917 26567 21036 14434 9949
768 21847 17211 6070 5018 26933 19937 10299 7930
1536 21853 17168 5439 4604 26452 20292 8886 7261
3072 21456 16651 3263 3165 26243 20120 8286 6868
6144 21383 16613 1607 1575 26209 20114 3338 3184
R 12288 13559 10997 1165 1137 18529 14306 2042 1965
24576 12429 10285 926 858 16547 12810 1575 1468
49152 12596 10358 758 702 16559 12756 1283 1192
98304 12572 10351 603 572 16509 12777 1059 1012
196608 12599 10363 510 492 16422 12752 834 818
393216 12573 10368 468 454 16403 12771 733 728
786432 12565 10383 442 429 16512 12775 687 685
R = RAM
Maximum speed 800 MHz x 2 DDR x 8 bus width x 4 channels = 51.2 GB/sec
Multiple cores need to be used for a higher throughput from RAM
|
To Start
Windows RandMemResults
Separate tables of speeds obtained via L1 cache, L2 cache and RAM are given below. Except when connected via the memory bus, performance via caches tends to be proportional to CPU MHz for a given type of processor. So, only a sample of results are provided. Details of cache sizes, speed and range of CPU MHz can be found in
PC CPU Specifications 1994 to 2014, plus Measured MIPS and MFLOPS per MHz.pdf.
RandMem L1 Cache Results in MBytes/Second
Integer Double/Integer
Serial Random Serial Random
CPU MHz Read Rd/Wrt Read Rd/Wrt Read Rd/Wrt Read Rd/Wrt
80486 DX2 66 63 80 69 87 47 65 51 80
Pentium 100 205 243 200 233 248 301 258 281
Pentium MMX 200 439 525 434 510 565 669 564 634
Pentium Pro 200 654 308 654 470 760 662 794 681
Pentium II 450 1471 1072 1508 1077 1745 1530 1801 1495
Celeron A 450 1496 1084 1511 1084 1757 1508 1761 1485
Pentium III 450 1500 1066 1482 1034 1702 1472 1719 1499
AMD K62 500 1114 1434 1131 1356 790 1575 841 1510
Celeron 2 566 1900 1375 1908 1357 2276 1928 2263 1882
Duron 700 1582 1730 1615 1727 2819 2320 2575 2253
Pentium IIIE 733 2460 1772 2462 1751 2909 2491 2928 2437
Athlon 800 1843 2025 1918 2017 2031 2401 1893 2444
Athlon Tbird 1000 2310 2514 2360 2471 4038 3310 3687 3256
Celeron M 1295 4620 3199 4511 3152 6404 4359 6666 4383
Atom M 1600 2639 3215 2722 3213 3398 3786 3437 3838
Pentium 4 1800 6361 3421 6559 2378 6139 5687 6138 3021
Ath4 Barton 1800 4068 4290 4077 4438 7377 5960 6654 5843
Core 2 Duo M 1830 4317 7669 6611 5123 8875 9348 9316 8444
Pentium M 1862 6586 4612 6701 4584 9793 6304 9771 6288
Pentium 4 1900 6553 3667 6788 2511 6361 6188 6443 3192
Turion 64 M 1900 4691 5222 4776 4965 7891 6653 7569 6660
Opteron 2000 4514 4909 4532 4922 8063 6609 7421 6464
Celeron C2 M 2000 6884 7227 7095 5034 10163 10333 6852 7987
Athlon XP 2080 4728 5215 4755 5158 8268 6830 7618 6800
Athlon 64 2210 5554 6072 5532 6129 9772 7799 9165 7724
Core i5 2467M 2300 7800 7822 8834 7978 10059 9427 10114 10698
Core 2 Duo 1 CP 2400 8821 9518 8806 7379 12415 12690 12405 12464
Pentium 4E HT 3000 9620 5664 9840 3460 8015 7874 8894 4655
Pentium 4 3000 10397 5781 10768 3830 10230 9448 10255 4938
Core i7 3060 10809 11713 10802 12145 14813 14343 14405 15544
Phenom II 3000 12252 8269 11570 8222 15567 10000 15514 10664
Core i7 3460 12122 7425 12505 6818 16279 9503 16598 10807
Pentium 4 3678 12630 7668 13268 4703 12561 11942 12478 6096
Core i7 4820K 3900 24674 21377 24041 20986 28277 24369 28276 27232
MIPS multiply by 0.55 0.37 0.55 0.37 0.28 0.31 0.28 0.31
|
Go To Start
RandMem L2 and L3 Cache Results in MBytes/Second
Integer Double/Integer
Serial Random Serial Random
CPU MHz Read Rd/Wrt Read Rd/Wrt Read Rd/Wrt Read Rd/Wrt
80486 DX2 66 23 17 11 12 22 15 12 14
Pentium 100 96 73 32 40 89 69 35 46
Pentium MMX 200 195 135 86 93 183 132 94 110
Pentium Pro 200 487 269 208 132 613 310 357 207
Pentium II 450 700 325 313 136 559 398 323 177
Celeron A 450 994 769 287 233 912 813 319 309
Pentium III 450 801 335 303 141 794 406 526 230
AMD K62 500 400 182 72 55 425 221 111 77
Celeron 2 566 1505 1222 373 356 1186 1376 388 426
Duron 700 1143 1073 678 718 1210 1313 1203 1390
Pentium IIIE 733 2060 1674 1531 993 2593 1827 2275 1479
Athlon 800 840 576 610 320 1169 864 1193 1048
Athlon Tbird 1000 1636 1543 976 1028 2520 1906 2454 2066
Celeron M 1295 3386 2678 1930 1009 4447 3183 3115 1649
Atom M 1600 2160 2306 718 944 2775 2584 1208 1455
Pentium 4 1800 4143 2129 2621 1901 6541 5023 4903 2313
Ath4 Barton 1800 2968 2819 1571 1814 4525 3220 4378 3733
Core 2 Duo M 1830 5793 6735 3061 2717 7520 6418 5412 4198
Pentium M 1862 4833 4132 2807 1458 6733 4965 4393 2371
Pentium 4 1900 5115 2215 2745 1965 6786 3036 4713 2437
Turion 64 M 1900 2804 2671 2486 2393 4426 3994 4797 4140
Opteron 2000 3128 3198 2881 2731 5222 3671 5249 4402
Celeron C2 M 2000 6213 7155 3319 3006 8788 7702 6050 4428
Athlon XP 2080 3458 3311 2054 2112 5232 3931 5083 4419
Athlon 64 2210 4070 3734 3322 3257 6140 4420 6124 5218
Core i5 2467M 2300 8593 7538 5300 3390 11588 8536 7796 5175
Core 2 Duo 1 CP 2400 7752 8989 4112 3655 10739 9632 7335 5771
Pentium 4E HT 3000 6892 3073 3482 2541 7855 4821 6899 3250
Pentium 4 3000 8104 3238 4291 3117 9936 6036 8324 3856
Core i7 3060 10156 10801 5895 5623 13359 12881 9894 9110
Phenom II 3000 10549 7860 6381 5215 15308 9662 14830 9879
Core i7 3460 11111 6666 5911 5429 13574 8977 10187 8073
Pentium 4 3678 9894 4533 5166 3785 12423 9155 9174 4396
Core i7 4820K 3900 22465 20538 13834 8952 28354 24827 22114 13686
L3 Cache at 3072 KB i5 1536 KB
Phenom II 3000 7874 6680 1077 1017 9428 8358 2048 2045
Core i5 2467M 2300 7064 5632 2243 1904 10357 7834 3927 2977
Core i7 3060 9718 9846 2364 2312 12661 11345 5207 4408
Core i7 3460 9762 6331 2378 2620 14411 9396 5608 4601
Core i7 4820K 3900 21853 17168 5439 4604 26452 20292 8886 7261
|
Go To Start
RandMem RAM Speed Results in MBytes/Second at 6.1 MB
The selected standard 6.1 MB was chosen to provide appropriate comparisons of random access speeds that reduce as memory capacity used increases.
Integer Double/Integer
Serial Random Serial Random
CPU MHz Read Rd/Wrt Read Rd/Wrt Read Rd/Wrt Read Rd/Wrt
80486 DX2 66 21 10 6 7 17 10 8 9
Pentium 100 55 35 11 14 52 40 19 22
Pentium MMX P0 200 121 83 23 26 111 78 31 36
Pentium Pro P0 200 129 75 32 21 142 84 53 16
AMD K62 P1 500 132 87 18 15 130 94 26 20
Celeron A P0 300 232 115 62 41 138 151 77 57
Duron P2 700 247 193 36 30 393 343 49 46
Athlon Tbird P2 1000 249 207 38 33 488 358 62 53
Celeron 2 P0 566 276 167 80 53 253 192 93 70
Athlon P2 800 250 191 38 33 371 323 54 51
Pentium II P1 450 300 152 89 61 196 194 107 79
Pentium III P1 450 329 167 98 68 350 233 169 112
Ath4 Barton D1 1800 383 265 69 48 559 343 115 77
Athlon 4 D1 1667 453 426 129 93 699 573 222 149
Pentium IIIEB P2 1000 469 257 142 97 513 344 215 156
Pentium IIIEB P2 733 474 204 96 66 391 251 123 88
Athlon XP D2 2080 884 727 183 116 1224 880 311 187
Pentium 4 P2 1900 940 387 48 42 914 483 76 64
Celeron M 1295 1029 456 89 55 1467 632 144 93
Pentium 4 R1 1400 1324 689 107 84 1123 912 159 118
Pentium 4 D1 1800 1394 630 98 80 1658 803 168 123
Opteron D3 2000 1536 1377 121 111 2297 1822 235 217
Pentium 4 D1 2533 1561 599 75 58 1623 738 118 90
Pentium 4 D1 3066 1737 655 70 51 1718 785 125 81
Turion 64 M DC3 1900 1758 1392 247 191 2222 1704 430 304
Pentium 4 R2 2533 1968 1019 172 145 2919 1352 297 220
Atom M DD2 1600 2058 1072 52 81 2283 1392 84 127
Pentium M DC1 1862 2073 787 340 213 2442 1238 616 376
Athlon 64 D3 1995 2100 965 156 122 2520 1432 291 225
Athlon 64 DC2 2210 2145 1451 248 159 3008 1785 402 254
Pentium 4 DC1 2533 2335 847 98 72 2303 978 166 114
Celeron C2 M DC3m 2000 3000 1212 302 183 3027 1455 514 311
Pentium 4 DC2 3678 3150 1850 181 124 4115 2103 294 196
Core 2 Duo M DC3M 1830 3384 1524 459 296 3349 1864 849 534
Pentium 4E HT DC2 3000 3523 1736 182 141 3569 2092 325 224
Core 2 Duo 1CP DC3b 2400 4854 2605 789 597 5532 3799 1486 1309
Core 2 Duo 1CP DC3a 2400 4947 770 349 208 1685 1052 932 557
Core 2 Duo 1CP DC3c 2400 5136 2775 878 657 6086 4041 1637 1396
Phenom II $C DC33 3000 6120 6079 747 654 9065 7991 1395 1220
Core i5 2467M DC33 2300 6127 5396 484 458 7722 6141 825 786
Core i7 $C DC32 3060 7261 5273 953 854 7008 5650 1665 1483
Core i7 $C DC33 3460 7811 5110 1071 870 8036 5998 1652 1742
Core i7 $C QC34 3900 21383 16613 1607 1575 26209 20114 3338 3184
Core i7 12.3MB QC34 3900 13559 10997 1165 1137 18529 14306 2042 1965
Maximum 13559 10997 1165 1137 18529 14306 2042 1965
Key P0 66 MHz P1 100 MHz
P2 133 MHz D1 DDR 133 MHz
D2 DDR 166 MHz D3 DDR 200 MHz
DC1 Dual Channel DDR 133 MHz DC2 Dual Channel DDR 200 MHz
DC3a DDR2 533 MHz nForce 570 chipset DC3b DDR2 533 MHz Intel 965 chipset
DC3c DDR2 800 MHz Intel 965 chipset DC3M DDR2 666 MHz Mobile CPU
DC3m DDR2 533 MHz Mobile CPU DC33 DDR3 1333 MHz
DC32 DDR3 1066 MHz QC34 DDR3 1600 MHz 4 Channels
R1/R2 RDRAM 400/533 MHz $C 6.1 MB Mainly or all L3 cache
|
Go To Start
SSEfpu Benchmark
SSE3DNow is a Windows benchmark that carries out similar calculations to
MemSpeed,
but uses floating point Single Instruction Multiple Data (SIMD) functions,
via assembly code instructions, plus some tests using normal C/C++ compilations.
The benchmark and source code are available in
sse3dnow.zip,
and further details and results are in
sse3dnow results.htm.
A 64 bit version is aslo available in
more64bit.zip.
3DNow fuctions are only available on AMD CPUs, using MMX registers. SSE deals with four single precision numbers in 128 bit registers, also used for two at double precision with SSE2.
Results are given as Millions of Bytes Per Second (MB/s) memory reading speed. On modern systems, the latter tends to be the same for SSE and SSE2 calculations, but twice the execution rate of SSE calculations.
Following is an example of logged results on a 2014 Core i7 CPU. This also shows the conversion factors for MB/second to MFLOPS. Using SSE, this processor is capable of producing four single precision results per clock cycle at 15.6 GFLOPS or eight per cycle, 31.2 GFLOPS, with linked add and multiply operation. The measured maximum here was not that good at 10.375 GFLOPS, and would need more register based calculations, within a loop, to improve the score. On the other hand, measured performance is aroung four times faster than
MemSpeed.
Core i7 4820K mainly running at 3.9 GHz using Turbo Boost
32 GB 1600 MHz RAM over 4 channels, Windows 8.1
Memory s=s+x[m]*y[m] x[m]=x[m]+y[m]
KBytes SSE2 SSE 3DNow Sngl SSE2 SSE 3DNow Sngl
Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S
L1 4 40800 40863 0 19658 80200 80443 0 12830
8 41176 41220 0 19265 78408 79958 0 12458
16 41395 41403 0 19262 80913 81351 0 12372
32 41457 41496 0 19097 81148 81959 0 12328
L2 64 41384 41355 0 17867 46570 46650 0 12088
128 41456 41498 0 17722 49062 49192 0 12148
256 41198 41249 0 17749 47763 48167 0 12074
L3 512 35492 35350 0 17968 30461 30443 0 11957
1024 35503 35502 0 17985 30385 30418 0 11945
2048 35809 35807 0 18098 30709 30392 0 12006
4096 35048 35043 0 17588 30166 30182 0 11798
8192 31740 32291 0 16997 27356 27306 0 11635
R 16384 20609 20109 0 13800 15237 15076 0 9988
32768 19916 19732 0 13448 14515 14535 0 9866
65536 19943 19678 0 13694 14531 14720 0 9889
131072 19962 19652 0 13433 14471 14497 0 10016
262144 19765 19716 0 13453 14696 14541 0 9917
524288 19809 19765 0 13479 14728 14542 0 9997
1048576 20093 19736 0 13588 14515 14688 0 9913
R = RAM
Divide DP SP SP SP DP SP SP SP
Maximum
MB/S by 8 4 4 16 8 8
MAX MFLOPS 5182 10375 0 4915 5072 10245 0 1604
|
To Start
Windows SSEfpu Results
Separate tables of speeds obtained via L1 cache, L2 cache and RAM follow.
The results include some for SSE64, the 64 bit version, in
more64bit.zip.
The the i387 normal floating point and 3DNow are not included, as the instructions are not supported at 64 bits. The SSE/SSE2 assembly code is the same as the original SSE3DNow, leading to no apparent difference in performance.
SSEfpu L1 Cache Results in MBytes/Second
|---- s=s+x[m]*y[m] -----| |---- x[m]=x[m]+y[m] ----|
CPU MHz SSE2 SSE 3DNow Sngl SSE2 SSE 3DNow Sngl
80486 DX2 66 0 0 0 21 0 0 0 21
Pentium 200 0 0 0 259 0 0 0 143
Pentium MMX 200 0 0 0 314 0 0 0 178
Pentium Pro 200 0 0 0 523 0 0 0 273
AMD K62+ 500 0 0 1939 572 0 0 3325 344
Celeron A 450 0 0 0 1174 0 0 0 609
Pentium II 450 0 0 0 1197 0 0 0 636
Pentium III 550 0 3511 0 1451 0 3341 0 768
Pentium IIIEB 800 0 5029 0 2076 0 4792 0 1047
Atom M 1600 3015 7232 0 2099 6189 12168 0 1155
Pentium 4 1900 9285 9338 0 2583 9331 9350 0 1237
Duron 750 0 0 5001 2601 0 0 6200 1330
P4 Xeon 2200 10738 10729 0 2805 11499 11505 0 1392
Pentium 4 2533 12389 12440 0 3223 12922 12853 0 1640
Pentium 4 3066 15116 15035 0 3912 15397 15288 0 2099
Athlon Tbird 1200 0 0 7865 4082 0 0 9721 2084
Celeron M 1295 9257 9891 0 4675 10025 9634 0 2006
Pentium 4 3678 18107 18084 0 4688 19201 19269 0 2391
Athlon XP 1400 0 10671 9344 4866 0 10262 11641 2443
Pentium 4E 3000 17765 17511 0 5758 20830 20605 0 3197
Core 2 Duo M 1830 18704 18482 0 6074 28235 28306 0 3429
Athlon XP 1733 0 13466 11998 6090 0 13727 14566 3088
Ath4 Barton 1800 0 13470 11916 6245 0 13975 14791 3200
Turion 64 M 1900 13697 14236 11785 6347 14693 14563 11420 3317
Pentium M 1862 13363 14338 0 6730 14463 14460 0 2893
Celeron C2 M 2000 19513 19481 0 6877 30564 29729 0 4051
Opteron 1990 15240 15232 13616 6928 15670 15668 12589 3523
Athlon 64 1995 15355 15245 13722 6985 15796 15794 12614 3552
Athlon XP 2080 0 15635 13836 7267 0 16237 17303 3698
Athlon 64 64 bit 2210 15988 15993 16179 17121
Athlon 64 2210 16382 16280 14518 7678 17531 17320 13931 4020
Core 2 Duo 64 bit 2400 25326 25389 37440 37532
Core 2 Duo 1 CP 2400 25371 25340 0 8768 37973 37972 0 4717
Core i5 2467M 2300 23843 23716 0 11390 44512 44790 0 6805
Phenom II 64 bit 3000 22235 21441 34485 45329
Phenom II 3000 23213 23217 19426 11433 45662 45826 25700 5373
Core i7 3060 30731 30730 0 12116 45838 45849 0 6071
Core i7 3460 35467 36212 0 13633 53307 46427 0 4414
Core i7 64 bit 3900 41037 41038 78440 78254
Core i7 4820K 3900 40800 40863 0 19658 80200 80443 0 12830
Maximum 41037 41038 19426 19658 80200 80443 25700 12830
Maximum MFLOPS 5130 10260 4857 4915 5013 10955 3213 1604
|
Go To Start
SSEfpu L2 and L3 Cache Results in MBytes/Second
|---- s=s+x[m]*y[m] -----| |---- x[m]=x[m]+y[m] ----|
CPU MHz SSE2 SSE 3DNow Sngl SSE2 SSE 3DNow Sngl
80486 DX2 66 0 0 0 13 0 0 0 11
Pentium 200 0 0 0 138 0 0 0 98
Pentium MMX 200 0 0 0 178 0 0 0 118
Pentium Pro 200 0 0 0 481 0 0 0 206
AMD K62+ 500 0 0 1180 467 0 0 1342 300
Celeron A 450 0 0 0 806 0 0 0 574
Pentium II 450 0 0 0 711 0 0 0 386
Pentium III 550 0 2338 0 1235 0 1830 0 739
Pentium IIIEB 800 0 3186 0 1874 0 2614 0 1016
Atom M 1600 2544 4303 0 1800 4441 4805 0 1053
Pentium 4 1900 9362 9149 0 2418 7206 7053 0 959
Duron 750 0 0 1916 1169 0 0 1579 730
P4 Xeon 2200 10168 10209 0 2624 7976 7991 0 1529
Pentium 4 2533 11791 11829 0 3086 9406 9383 0 1784
Pentium 4 3066 14187 13063 0 3624 11031 11020 0 2117
Athlon Tbird 1200 0 0 3258 2024 0 0 2530 1134
Celeron M 1295 5661 5686 0 3635 4319 4307 0 1641
Pentium 4 3678 16454 17103 0 4324 13092 13213 0 2642
Athlon XP 1400 0 2330 2474 1804 0 2403 2785 1320
Pentium 4E 3000 16562 16632 0 4839 13078 13045 0 3024
Core 2 Duo M 1830 12608 12777 0 5944 12157 11768 0 3067
Athlon XP 1733 0 4600 4671 2640 0 3476 3461 1708
Ath4 Barton 1800 0 4673 4779 2615 0 3537 3517 1754
Turion 64 M 1900 5267 5547 4876 3304 3239 3124 3270 1779
Pentium M 1862 8225 8222 0 5282 6027 5952 0 2435
Celeron C2 M 2000 14008 13687 0 6925 13565 13565 0 3098
Opteron 1990 7223 7409 6686 3819 4036 4038 4286 1953
Athlon 64 1995 7200 7467 6707 3849 3736 3742 4068 1891
Athlon XP 2080 0 5577 5828 3276 0 4227 4184 2065
Athlon 64 2210 7822 7802 7522 4186 4792 4939 4851 2226
Athlon 64 64 bit 2210 8471 7930 5940 5916
Core 2 Duo 1 CP 2400 16839 17072 0 8334 16389 15920 0 4111
Core 2 Duo 64 bit 2400 18281 18536 17041 17065
Core i5 2467M 2300 23974 23578 0 11131 27176 28358 0 6605
Phenom II 3000 23022 23039 16241 11246 17541 17394 14997 4965
Phenom II 64 bit 3000 23409 23357 18237 18163
Core i7 3060 27558 27566 0 11019 25457 25367 0 5509
Core i7 3460 31812 28222 0 12454 28270 29044 0 4818
Core i7 64 bit 3900 41560 41640 50611 50462
Core i7 4820K 3900 41456 41498 0 17722 49062 49192 0 12148
Maximum 41560 41640 16241 17722 50611 50462 14997 12148
Maximum MFLOPS 5195 10410 4060 4430 3163 6308 1875 1518
L3 Cache
Core i5 2467M 2300 21091 20073 0 10959 18897 18205 0 6606
Phenom II 3000 10205 10746 8609 7401 9510 9535 8595 4673
Core i7 3060 22999 23037 0 10800 18395 18390 0 5472
Core i7 3460 25824 23662 0 12348 20689 21053 0 4525
Core i7 64 bit 3900 36080 36324 31720 31725
Core i7 4820K 3900 35048 35043 0 17588 30166 30182 0 11798
|
Go To Start
SSEfpu RAM Speed Results in MBytes/Second
|---- s=s+x[m]*y[m] -----| |---- x[m]=x[m]+y[m] ----|
CPU MHz SSE2 SSE 3DNow Sngl SSE2 SSE 3DNow Sngl
80486 DX2 66 0 0 0 13 0 0 0 9
Pentium 200 0 0 0 77 0 0 0 60
Pentium MMX 200 0 0 0 110 0 0 0 84
Pentium Pro 200 0 0 0 128 0 0 0 82
AMD K62+ 500 0 0 196 175 0 0 135 126
Athlon Tbird P2 1200 0 0 417 215 0 0 340 185
Duron P2 750 0 0 525 284 0 0 379 215
Pentium II P1 450 0 0 0 291 0 0 0 166
Celeron A P1 450 0 0 0 335 0 0 0 178
Pentium III P1 550 0 670 0 359 0 297 0 175
Pentium IIIEB P2 800 0 836 0 402 0 421 0 257
Ath4 Barton #D1 1800 0 592 582 553 0 399 385 395
Core 2 Duo A #DC3 2040 717 717 0 811 633 633 0 644
Athlon XP D1 1400 0 1118 1040 886 0 730 677 637
Pentium 4 P2 1900 974 975 0 945 594 595 0 588
Athlon XP D1 1733 0 1371 1381 1116 0 1061 921 928
Athlon XP D2 2080 0 1391 1364 1212 0 999 923 851
Athlon XP D2 2170 0 1631 1625 1338 0 1240 1093 1121
Celeron M 1295 1504 1517 0 1345 822 832 0 758
Pentium 4 D1 2533 1415 1386 0 1401 847 841 0 840
Atom M DD2 1600 2303 2854 0 1660 1697 1762 0 999
Pentium 4 D1 2533 1852 1843 0 1750 626 628 0 616
Pentium 4 D2 3066 1883 1878 0 1802 1034 1034 0 1000
P4 Xeon R1 2200 2427 2426 0 1968 1240 1240 0 1087
Pentium 4 DC1 2533 2187 2180 0 2018 1286 1280 0 1169
Opteron D3 1990 2601 2605 2548 2322 2061 2044 2112 1567
Pentium 4 R2 2533 3504 3494 0 2480 1743 1746 0 1440
Turion 64 M DC3 1900 2862 2858 2805 2419 2052 2118 2140 1507
Pentium M DC1 1862 2399 2331 0 2491 1380 1381 0 1340
Athlon 64 D3 1995 2688 2711 2656 2564 1564 1558 1559 1478
Athlon 64 64b DC2 2210 3166 3193 2044 2043
Athlon 64 DC2 2210 3325 3329 3339 2935 2074 2080 2071 1804
Celeron C2 M DC3 2000 3096 3100 0 3080 1926 1726 0 1901
Pentium 4E DC2 3000 3639 3672 0 3240 2383 2380 0 2259
Pentium 4 DC2 3678 4408 4389 0 3498 2648 2639 0 2102
Core 2 Duo M DC4 1830 4388 4425 0 4144 2422 2521 0 2320
Phenom 64 bit DC7 3000 6329 6279 4680 4460
Phenom II DC7 3000 6511 6538 5083 4308 4700 4773 4576 3572
Core 2 Duo B DC3 2400 4904 4895 0 4752 3131 3134 0 3015
Core 2 64 bit DC5 2400 5628 5630 3617 3713
Core 2 Duo C DC5 2400 5777 5749 0 5157 3866 3823 0 3371
Core i7 DC6 3060 9196 9191 0 6561 7035 7049 0 4454
Core i7 DC7 3460 12467 10690 0 7739 8357 8264 0 4486
Core i5 2467M DC7 2300 14105 12928 0 8969 10917 10781 0 5933
Core i7 64 bit DC7 3900 19410 19406 13895 13472
Core i7 4820K QC8 3900 20093 19736 0 13588 14515 14688 0 9913
Maximum 20093 19736 5083 13588 14515 14688 4576 9913
Maximum MFLOPS 2512 4934 1271 3397 907 1836 572 1239
Key P1 100 MHz P2 133 MHz
D1 DDR 133 MHz D2 DDR 166 MHz
D3 DDR 200 MHz DC1 Dual Channel DDR 133 MHz
DC2 Dual Channel DDR 200 MHz DC3 DDR2 533 MHz
DC4 DDR2 666 MHz DC5 DDR2 800 MHz
DC6 DDR3 1066 MHz DC7 DDR3 1333 MHz
QC8 DDR3 1600 MHz 4 Channels
R1 RDRAM 400 MHz R2 RDRAM 533 MHz
# Slow speed example 64b 64 bit compilation
C2D A # nForce 570 chipset C2D B/C Intel 965 Chipset
|
Go To Start
FFT Benchmarks
The FFT benchmarks started life in early 2000, based on a program from Scott Taylor of DSP Systems Inc. The Windows versions were titled FFTGraf. Three of them were produced that provide a graphical output, starting with one that was optimised all C code. The second one was further optimised including assembly language. The third had SSE SIMD assembly code and further tuning changes. Further details can be found in
fftgraf results.htm.
The benchmarks and source codes can be downloaded from
fftgraf.zip.
The benchmarks run code for single and double precision Fast Fourier Transforms of size 1024 to 1048576 (1K to 1024K), each one being run a number of times to identify variance. Besides the graph, results are displayed and saved in a log file, with FFT running time in milliseconds. An example of results is shown below. As shown, some checks of numeric calculations are carried out on the largest FFTs. These are subject to variation due to different rounding effects.
The latest are all C code, with only text output, with FFT1, being the original and FFT3c, the third one with rearranged C statements, instead of assembly code. These comprise 32 bit and 64 bit versions to run via Windows, Linux and Android.
Further details and results are in
FFTBenchmarks.htm.
An example of the latest 64 bit benchmark is also provided below, for the same Core i7, via Windows. Note similar performance to FFTGraf and different sumchecks.
FFTGraf Example Log File Core i7 4820K mainly running at 3.9 GHz
FFTGraf Test Version 3.00 Sun Sep 24 17:08:35 2017
By Roy Longbottom via Scott Taylor's code and now SSE, SSE2, 3DNow
Windows NT Version 6.2, build 9200,
CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000306E4, 3711 MHz
From GlobalMemoryStatus: Size 2097151 KB, Free 2097151 KB
5 Passes Min 1 K Max 1024 K Max Seconds 15
Size Single Precision FFTs using SSE
K Millisecond each pass
1 0.015 0.011 0.011 0.011 0.011
2 0.023 0.023 0.023 0.023 0.023
4 0.050 0.050 0.050 0.050 0.050
8 0.125 0.125 0.140 0.125 0.126
16 0.323 0.309 0.316 0.308 0.308
32 0.707 0.687 0.689 0.698 0.688
64 1.55 1.48 1.48 1.48 1.48
128 3.29 3.20 3.21 3.20 3.20
256 7.30 6.97 6.97 6.96 6.96
512 16.4 16.1 16.1 16.3 16.1
1024 39.3 38.4 38.7 38.7 38.4
Size Double Precision FFTs using SSE2
K Millisecond each pass
1 0.014 0.014 0.014 0.014 0.013
2 0.030 0.030 0.030 0.030 0.030
4 0.071 0.071 0.071 0.071 0.071
8 0.154 0.155 0.153 0.156 0.154
16 0.403 0.400 0.410 0.400 0.400
32 0.888 0.869 0.868 0.885 0.869
64 1.87 1.85 1.88 1.86 1.85
128 3.98 3.95 3.96 3.96 3.95
256 9.02 8.93 8.90 8.87 8.87
512 22.7 22.6 22.9 22.7 22.6
1024 57.0 55.5 55.5 55.4 55.5
Checks SP 9.999890e-001 3.338029e-006 1.043487e-011
Checks DP 1.000000e+000 1.133294e-023 1.428096e-028
End FFT Test Sun Sep 24 17:08:38 2017
FFT 64 Bit Benchmark Version 3c.0 Mon Sep 25 10:54:58 2017
Size milliseconds
K Single Precision Double Precision
1 0.019 0.013 0.012 0.013 0.012 0.012
2 0.029 0.026 0.026 0.028 0.028 0.027
4 0.063 0.059 0.059 0.071 0.070 0.070
8 0.153 0.143 0.142 0.165 0.164 0.164
16 0.364 0.335 0.334 0.385 0.384 0.384
32 0.750 0.725 0.735 0.814 0.814 0.815
64 1.670 1.555 1.556 1.751 1.765 1.750
128 3.528 3.375 3.386 3.778 3.748 3.748
256 7.663 7.331 7.342 8.772 8.898 8.727
512 17.683 17.250 17.229 23.083 22.583 22.768
1024 43.607 42.397 42.613 58.202 56.390 55.824
1024 Square Check Maximum Noise Average Noise
SP 9.999520e-001 3.346482e-006 4.565234e-011
DP 1.000000e+000 1.133294e-023 1.428110e-028
|
To Start
Windows FFT Results
Below is an example of the graph produced by FFTGraf, running on a 3900 MHz Core i7. It is automatically scaled at run time and is based on milliseconds per K FFT size to reduce the range, as opposed to milliseconds, where the slowes can be thousands of times greater than the fastest. The graph also indicates the range of memory address space used.
Following this is a table showing the number of floating point operations used at each FFT size and the calculations of MFLOPS for the three versions of FFTGraf (FP op count/1000/milliseconds from the later tables). The original version comprised compiled C code. Analysis of the data flow identified that access was largely dependent on skipped sequential addresses. The second version included assembly code for the critical calculations, data loading in segments into L2 cache and optimised use from burst reading of data. The third version made use of SSE or 3DNow SIMD instructions. As seen in the table, performance of the larger FFTs could be increased by more than three times. Note more than five times was noted on earlier PCs.
After the above are keys for cache and RAM sizes used on systems identified in the later detailed tables.
3900 MHz Core i7 MFLOPS - from original FFTGraf results
FFT1 FFT2 FFT3
FFT size FP op count SP DP SP DP SP DP
1024 53312 2539 2318 3332 3136 5331 4101
2048 116864 2486 2164 3437 3075 5312 3895
4096 254080 2310 1815 3630 2823 5082 3630
8192 549120 1961 1771 3230 2890 4576 3661
16384 1179904 1967 1686 2950 2510 4069 2950
32768 2523648 1682 1262 2804 2524 3605 2804
65536 5374464 1311 1221 2829 2443 3839 2829
131072 11404288 1267 1267 2782 2479 3801 2851
262144 24118272 1269 1049 2680 2412 3445 2680
524288 50857984 1060 892 2543 1956 3391 2312
1048576 106956800 947 557 2183 1725 2891 1945
Cache & RAM Key
L1 and L2 cache size e.g. 16 = 8 KB L1 and 256 KB L2
1 = 8 KB 2 = 16 KB 3 = 32 KB 4 = 64 KB 5 = 128 KB
6 = 256 KB 7 = 512 KB 8 = 1 MB 9 = 2 MB A = 4 MB
H = 24 KB
Z = 512 KB + 6 MB X = 256 KB + 8 MB W = 256 KB + 3 MB, V = 256 KB + 10 MB
B = L2 on memory bus F = At CPU MHz H = Half CPU MHz
Bus/Memory Speed
Numbers 33, 50, 66, 100, 133 = MHz
DD1 = DDR at 133 MHz DC1 = Dual Channel DDR at 133 MHz
DD2 = DDR at 166 MHz DC2 = Dual Channel DDR at 166 MHz
DD3 = DDR at 200 MHz DC3 = Dual Channel DDR at 200 MHz
RD2 = RDRAM 400 MHz RD1 = One Channel RDRAM 400 MHz
RD3 = RDRAM 533 MHz DC4 = DDR2 533 MHz DC5 = DDR2 666 MHz
DC6 = DDR2 800 MHz DC7 = DDR3 1066 MHz DC8 = DDR3 1333 MHz
QC9 = DDR3 1600 MHz 4 channel SCC = DDR2 533 MHz single channel
# = Paticularly slow memory S - last column - uses SSE or SSE2 instructions
|
To Start
FFTGraf Version 1
Single Precision Milliseconds
Cache FFT Size K --->
Processor MHz & RAM 1 2 4 8 16 32 64 128 256 512 1024
80486 66 15B 33 17 39 85 196 509 1240 2752 5864 12427
Pentium 100 16B 50 3.0 9.7 22 54 127 307 801 1790 3844
Pentium MMX 200 27B 66 1.2 3.1 11 24 52 119 277 807 1806 3844
Pentium Pro 200 16F 66 1.1 2.9 6.4 14 37 101 358 797 1740 3717
Celeron A 400 25F 66 0.36 0.85 2.6 7.5 36 106 254 569 1188 2543 5356
Pentium II 450 27H 100 0.32 0.86 4.1 9.2 20 47 132 395 985 2257 4627
Pentium IIIE 550 26F 100 0.26 0.60 1.6 3.5 12 34 134 309 684 1461 3313
Pentium IIIEB 733 26F 133 0.19 0.46 1.2 2.6 6.2 27 128 291 626 1377 2876
Pentium IIIEB 1000 26F 133 0.14 0.33 0.82 1.8 4.6 33 122 300 657 1414 3029
Pentium IIIEB 1000 26F RD1 0.14 0.33 0.82 1.8 4.2 16 91 216 478 1029 2126
Pentium 4 1500 16F RD2 0.14 0.33 0.77 1.7 4.3 17 93 235 565 1296 2809
Pentium 4 1900 16F 133 0.11 0.27 0.60 1.4 3.4 18 172 402 907 1985 4214
P4 Xeon 2200 17F RD2 0.093 0.23 0.53 1.2 3.0 7.4 31 194 480 1121 2435
Celeron M 1295 38F 0.089 0.20 0.49 1.4 3.0 6.6 15 75 584 1379 3121
Pentium 4E 3000 28F DC3 0.072 0.15 0.38 0.83 1.8 4.2 10 40 226 494 1043
Pentium 4N 3066 17F DD1 0.067 0.17 0.37 0.84 2.1 5.3 32 268 617 1368 2877
Pentium M2 1862 39F DC1 0.063 0.14 0.34 0.94 2.1 4.5 10 24 78 452 1266
Atom M 1600 H7F SCC 0.53 0.57 1.3 3.0 6.5 15 51 228 506 1095 2241
Core 2 Duo M 1830 39F DC5 0.078 0.19 0.34 0.94 2.2 4.8 11 24 80 318 814
Celeron C2 M 2000 38F DC4 0.053 0.13 0.31 0.86 2.0 4.7 10 54 264 571 1211
Core2 Duo A1CP 2400 3AF DC4 0.043 0.11 0.26 0.72 1.7 3.7 8.2 18 42 134 1404
Core2 Duo B1CP 2400 3AF DC4 0.043 0.11 0.26 0.72 1.7 3.7 8.2 18 42 108 565
Core i5 2467M 2300 3WF DC8 0.036 0.080 0.18 0.48 1.1 2.5 6.7 16 34 111 258
Core i7 930 3060 3XF DC7 0.033 0.076 0.18 0.46 1.0 2.4 6.5 14 31 75 168
Core i7 860 3460 3XF DC8 0.033 0.076 0.18 0.46 1.0 2.4 6.3 14 30 72 171
Core i7 4820K 3900 3VF QC9 0.021 0.047 0.11 0.28 0.6 1.5 4.1 9 19 48 113
AMD K62 350 37B 100 1.1 2.4 6.2 27 65 167 375 903 2012 4336 9219
Duron 700 44F 133 0.17 0.37 0.82 2.4 14 74 170 399 1065 2423 5361
Athlon Tbird 1200 46F 133 0.10 0.21 0.46 1.3 6.1 20 167 401 934 2056 4605
Athlon 4 1725 46F DD1 0.066 0.15 0.32 0.91 4.3 11 82 193 462 1035 2160
Athlon 4 Bart 1800 47F#DD1 0.064 0.14 0.32 0.88 4.1 10 28 361 819 1800 3716
Turion 64 M 1900 47F DC4 0.072 0.16 0.34 0.89 4.0 9.3 23 99 233 556 1226
Athlon XP 2080 46F DD2 0.056 0.12 0.27 0.76 3.5 9.2 74 176 428 967 2014
Athlon 64aa 2210 47F DC3 0.051 0.11 0.25 0.73 3.0 7.4 17 101 227 514 1139
Phenom 3000 4ZF DC8 0.037 0.082 0.19 0.50 1.8 4.4 11 30 66 192 598
Double Precision Milliseconds
80486 66 15B 33 21 46 99 262 677 1493 3131 6595 13489
Pentium 100 16B 50 4.7 11 29 65 159 415 947 2010 4256
Pentium MMX 200 27B 66 1.6 5.5 13 27 60 176 393 903 1911 4051
Pentium Pro 200 16F 66 1.4 3.0 6.7 19 65 190 430 925 1980 4222
Celeron A 400 25F 66 0.49 1.2 4.6 18 56 134 295 635 1385 2916 6081
Pentium II 450 27H 100 0.45 2.1 4.7 10 23 65 233 528 1161 2381 4935
Pentium IIIE 550 26F 100 0.30 0.79 1.7 3.9 17 82 193 416 848 1819 3742
Pentium IIIEB 733 26F 133 0.23 0.60 1.3 3.0 16 69 160 349 750 1600 3295
Pentium 4 1500 16F RD2 0.19 0.43 0.96 2.5 9.8 48 119 284 650 1392 3080
Pentium IIIEB 1000 26F 133 0.17 0.41 1.0 2.6 16 68 166 360 772 1645 3493
Pentium IIIEB 1000 26F RD1 0.17 0.41 0.91 2.1 8.7 47 110 240 512 1142 2400
Pentium 4 1900 16F 133 0.15 0.33 0.75 1.9 13 92 213 463 1006 2095 4463
P4 Xeon 2200 17F RD2 0.13 0.29 0.65 1.7 4.1 19 101 247 574 1217 2684
Celeron M 1295 38F 0.11 0.25 0.67 1.5 3.2 7.2 39 296 712 1518 3127
Pentium 4N 3066 17F DD1 0.084 0.19 0.43 1.1 2.8 19 138 314 696 1421 3070
Pentium 4E 3000 28F DC3 0.076 0.20 0.42 0.93 2.1 5.0 22 114 251 524 1144
Pentium M2 1862 39F DC1 0.074 0.17 0.47 1.0 2.2 4.9 12 45 260 625 1361
Atom M 1600 H7F SCC 0.26 0.64 1.4 3.1 6.8 26 118 262 567 1156 2439
Core 2 Duo M 1830 39F DC5 0.069 0.17 0.45 1.0 2.3 5.0 12 41 200 428 871
Celeron C2 M 2000 38F DC4 0.064 0.15 0.42 0.94 2.1 4.6 26 139 301 605 1231
Core2 Duo A1CP 2400 3AF DC4 0.052 0.13 0.35 0.79 1.8 3.9 8.5 20 85 781 1824
Core2 Duo B1CP 2400 3AF DC4 0.052 0.13 0.35 0.79 1.8 3.9 8.5 20 54 293 704
Core i5 2467M 2300 3WF DC8 0.041 0.094 0.24 0.54 1.2 3.3 7.3 17 55 128 281
Core i7 930 3060 3XF DC7 0.040 0.091 0.23 0.52 1.2 3.2 7.1 15 37 86 284
Core i7 860 3460 3XF DC8 0.040 0.092 0.23 0.52 1.2 3.1 6.8 15 36 87 259
Core i7 4820K 3900 3VF QC9 0.023 0.054 0.14 0.31 0.7 2.0 4.4 9 23 57 192
AMD K62 350 37B 100 1.1 3.0 12 24 66 172 501 1141 2448 5082 10275
Duron 700 44F 133 0.20 0.43 1.3 7.6 39 90 205 547 1248 2756 5972
Athlon Tbird 1200 46F 133 0.11 0.23 0.66 3.0 11 89 209 529 1188 2605 5629
Athlon 4 1725 46F DD1 0.074 0.16 0.47 2.1 5.8 47 107 248 545 1146 2464
Athlon 4 Bart 1800 47F#DD1 0.075 0.16 0.46 1.9 4.6 16 186 422 926 1918 4065
Turion 64 M 1900 47F DC4 0.069 0.15 0.44 1.9 4.4 11 50 118 277 614 1366
Athlon XP 2080 46F DD2 0.065 0.14 0.40 1.7 4.7 34 83 211 479 1009 2196
Athlon 64aa 2210 47F DC3 0.058 0.13 0.36 1.4 3.4 8.9 51 119 258 559 1219
Phenom 3000 4ZF DC8 0.042 0.10 0.25 0.90 2.2 5.3 15 33 94 303 740
|
Go To Start
FFTGraf Version 2
Single Precision Milliseconds
Cache FFT Size K --->
Processor MHz & RAM 1 2 4 8 16 32 64 128 256 512 1024
80486 66 15B 33 16 35 82 186 403 858 1870 3948 8451
Pentium 100 16B 50 3.1 7.3 16 36 86 195 431 924 1952
Pentium MMX 200 27B 66 1.4 3.3 8.0 17 38 87 194 423 899 1894
Pentium Pro 200 16F 66 0.67 1.5 3.2 7.0 23 54 119 250 526 1115
Celeron A 400 25F 66 0.30 0.68 1.7 6.4 16 38 84 189 401 850 1789
Pentium IIIE 550 26F 100 0.21 0.47 1.1 2.3 7.1 19 44 95 201 429 940
Pentium IIIEB 660 26F 133 0.17 0.39 0.9 1.9 6.2 17 38 85 188 410 872
Celeron 2 900 25F 100 0.13 0.30 0.82 2.7 12 33 73 166 344 736 1568
PIII Tualatin 1266 27F 133 0.088 0.20 0.45 1.0 2.3 6.1 19 50 117 264 569
Pentium 4 1900 16F 133 0.075 0.18 0.46 1.1 3.3 9.4 27 69 160 353 768
Celeron M 1295 38F 0.073 0.16 0.36 0.84 1.9 4.3 11 34 91 211 484
Pentium 4E 3000 28F DC3 0.061 0.13 0.40 1.0 2.2 4.7 10 25 61 128 297
Pentium 4N 2400 17F RD2 0.060 0.14 0.35 0.78 1.9 5.1 16 48 118 259 575
Pentium 4N 2400 17F 133 0.060 0.14 0.35 0.78 2.0 5.9 20 58 128 283 648
Pentium M2 1862 39F DC1 0.052 0.11 0.25 0.59 1.4 2.9 6.3 16 41 107 245
Pentium 4N 3066 17F DD1 0.045 0.12 0.28 0.62 1.5 4.3 15 46 111 235 524
Atom M 1600 H7F SCC 0.46 0.49 1.1 2.3 5.3 12 28 68 147 324 700
Core 2 Duo M 1830 39F DC5 0.048 0.11 0.25 0.58 1.4 2.9 6.4 15 37 90 198
Celeron C2 M 2000 38F DC4 0.044 0.10 0.23 0.53 1.2 2.7 6.1 17 42 96 216
Core2 Duo A1CP 2400 3AF DC4 0.035 0.080 0.18 0.44 1.0 2.2 4.7 10 27 83 246
Core2 Duo B1CP 2400 3AF DC4 0.036 0.080 0.18 0.44 1.0 2.2 4.8 10 24 60 151
Core2 Duo B1CP 2400 3AF DC6 0.054 0.12 0.19 0.44 1.0 2.2 4.7 11 24 58 140
Core i5 2467M 2300 3WF DC8 0.030 0.061 0.13 0.30 0.70 1.5 3.3 7.3 16 39 84
Core i7 930 3060 3XF DC7 0.026 0.054 0.12 0.27 0.64 1.4 3.0 6.5 14 32 78
Core i7 860 3460 3XF DC8 0.026 0.055 0.12 0.28 0.63 1.4 3.0 6.4 14 32 74
Core i7 4820K 3900 3VF QC9 0.016 0.034 0.07 0.17 0.40 0.9 1.9 4.1 9 20 49
Duron 700 44F 133 0.13 0.26 0.55 1.7 6.4 17 42 96 229 524 1199
Athlon Tbird 1200 46F 133 0.075 0.16 0.33 0.89 3.5 12 36 82 199 465 1089
Athlon 4 1410 46F DD1 0.062 0.13 0.27 0.76 2.2 6.8 18 41 100 217 497
Athlon 4 1794 46F DD3 0.049 0.11 0.22 0.60 1.8 5.3 14 31 75 163 364
Athlon 4 Bart 1800 47F#DD1 0.049 0.10 0.22 0.61 1.6 4.8 22 52 126 277 620
Turion 64 M 1900 47F DC4 0.047 0.10 0.20 0.55 1.5 3.7 11 26 59 132 301
Athlon XP 2080 46F DD2 0.043 0.089 0.19 0.52 1.6 4.9 13 29 71 171 380
Athlon 64aa 2210 47F DC3 0.040 0.086 0.18 0.47 1.2 3.0 9.2 21 47 106 247
Phenom 3000 4ZF DC8 0.026 0.056 0.12 0.30 0.75 1.8 4.5 11 24 57 162
Double Precision Milliseconds
80486 66 15B 33 20 50 113 251 536 1258 2660 5654 11698
Pentium 100 16B 50 4.0 8.6 20 50 121 268 582 1224 2614
Pentium MMX 200 27B 66 1.7 4.5 10 21 49 111 244 560 1148 2417
Pentium Pro 200 16F 66 0.91 2.0 4.2 14 35 81 172 374 817 1779
Celeron A 400 25F 66 0.34 0.88 4.2 13 38 78 172 365 782 1645 3486
Pentium IIIE 550 26F 100 0.23 0.55 1.2 3.0 11 26 58 127 278 618 1374
Pentium IIIEB 660 26F 133 0.19 0.46 1.0 2.6 10 23 54 121 262 577 1276
Celeron 2 900 25F 100 0.15 0.36 1.6 12 33 77 170 333 683 1438 3073
PIII Tualatin 1266 27F 133 0.10 0.23 0.49 1.1 4.3 11 31 78 184 412 917
Pentium 4 1900 16F 133 0.10 0.23 0.51 1.4 5.9 16 36 85 185 406 907
Celeron M 1295 38F 0.082 0.19 0.44 0.94 2.1 6.2 21 56 130 295 668
Pentium 4N 2400 17F 133 0.075 0.18 0.39 1.0 3.7 12 33 75 169 372 819
Pentium 4N 2400 17F RD2 0.074 0.18 0.39 1.0 3.0 9.0 23 57 128 285 651
Pentium 4E 3000 28F DC3 0.062 0.18 0.49 0.97 2.8 5.9 15 34 73 167 390
Pentium 4N 3066 17F DD1 0.058 0.14 0.30 0.80 2.6 8.6 24 56 124 273 620
Pentium M2 1862 39F DC1 0.058 0.13 0.31 0.65 1.5 3.2 8.2 23 63 146 334
Atom M 1600 H7F SCC 0.23 0.51 1.1 2.4 5.6 14 32 71 156 337 739
Core 2 Duo M 1830 39F DC5 0.055 0.12 0.29 0.63 1.4 3.1 7.2 20 48 105 233
Celeron C2 M 2000 38F DC4 0.051 0.12 0.27 0.58 1.3 3.3 9.9 24 53 118 269
Core2 Duo A1CP 2400 3AF DC4 0.041 0.094 0.22 0.48 1.1 2.3 5.0 16 59 164 418
Core2 Duo B1CP 2400 3AF DC4 0.041 0.094 0.22 0.48 1.1 2.3 5.0 12 32 83 191
Core2 Duo B1CP 2400 3AF DC6 0.042 0.10 0.22 0.48 1.1 2.4 5.2 12 31 75 167
Core i5 2467M 2300 3WF DC8 0.032 0.069 0.15 0.33 0.90 1.7 3.7 8.6 20 44 97
Core i7 930 3060 3XF DC7 0.028 0.062 0.14 0.30 0.73 1.6 3.4 7.2 17 41 95
Core i7 860 3460 3XF DC8 0.028 0.062 0.14 0.30 0.71 1.5 3.3 7.0 16 39 88
Core i7 4820K 3900 3VF QC9 0.017 0.038 0.09 0.19 0.47 1.0 2.2 4.6 10 26 62
Duron 700 44F 133 0.14 0.28 0.88 5.0 15 34 76 172 379 836 1870
Athlon Tbird 1200 46F 133 0.081 0.17 0.45 1.6 7.9 22 53 123 282 645 1485
Athlon 4 1410 46F DD1 0.066 0.14 0.38 1.3 4.7 12 26 61 137 306 695
Athlon 4 1794 46F DD3 0.057 0.11 0.31 1.1 3.8 9.5 21 47 105 227 517
Athlon 4 Bart 1800 47F#DD1 0.053 0.11 0.30 1.0 4.2 15 35 81 183 409 925
Turion 64 M 1900 47F DC4 0.049 0.10 0.28 1.0 2.6 7.1 16 34 78 177 396
Athlon XP 2080 46F DD2 0.046 0.10 0.26 0.89 3.6 8.8 19 44 102 229 527
Athlon 64aa 2210 47F DC3 0.041 0.086 0.22 0.77 2.1 5.8 13 29 66 147 326
Phenom 3000 4ZF DC8 0.028 0.059 0.15 0.45 1.0 2.5 5.6 13 32 88 216
|
Go To Start
FFTGraf Version 3
Single Precision Milliseconds
Cache FFT Size K --->
Processor MHz & RAM 1 2 4 8 16 32 64 128 256 512 1024
Pentium 200 16B 66 1.5 3.9 8.4 19 43 97 220 484 1048 2218 4611
Pentium MMX 200 27B 66 1.4 3.2 7.8 17 38 86 192 417 882 1869
Pentium Pro 200 16F 66 0.80 1.8 3.8 8.3 22 55 121 263 557 1220
Pentium II 400 27H 100 0.30 0.77 2.5 5.4 12 31 81 189 409 876 1897
Celeron A 450 25F 100 0.27 0.60 1.4 3.3 10 24 51 109 237 502 1092
Pentium IIIE 550 26F 100 0.18 0.40 0.90 1.9 6.2 18 40 87 185 394 838 S
Pentium 4 1900 16F 133 0.074 0.16 0.35 0.71 2.3 7.6 22 50 107 236 571 S
Celeron M 1295 38F 0.071 0.15 0.33 0.83 1.9 4.3 11 33 86 194 436 S
Pentium 4N 2400 17F 133 0.058 0.13 0.27 0.57 1.4 4.3 16 49 104 224 521 S
Pentium 4N 2400 17F RD2 0.057 0.12 0.27 0.56 1.4 3.6 12 32 70 156 364 S
Pentium 4N 2533 17F DD1 0.055 0.12 0.25 0.52 1.3 3.6 12 37 78 169 393 S
Pentium 4N 2533 17F RD3 0.055 0.12 0.26 0.53 1.3 3.3 10 26 57 124 289 S
Pentium 4E 3000 28F DC3 0.052 0.11 0.23 0.49 1.2 2.8 6.3 17 39 83 182 S
Pentium M2 1862 39F DC1 0.050 0.10 0.23 0.58 1.3 2.9 6.3 15 39 95 213 S
Pentium 4N 3066 17F DD1 0.044 0.10 0.21 0.44 1.1 3.1 11 33 71 154 359 S
Pentium 4N 3678 17F DC3 0.038 0.086 0.18 0.37 0.91 2.3 6.9 19 42 92 231 S
Atom M 1600 H7F SCC 0.22 0.23 0.58 1.2 2.9 6.6 17 42 92 200 437 S
Core 2 Duo M 1830 39F DC5 0.033 0.07 0.16 0.38 0.89 2.0 5.0 10 27 65 136 S
Celeron C2 M 2000 38F DC4 0.032 0.07 0.15 0.35 0.82 1.8 4.4 14 34 73 159 S
Core2 Duo A1CP 2400 3AF DC4 0.024 0.053 0.12 0.29 0.67 1.5 3.2 6.8 19 66 213 S
Core2 Duo B1CP 2400 3AF DC4 0.025 0.053 0.12 0.29 0.67 1.5 3.2 6.8 16 42 108 S
Core i5 2467M 2300 3WF DC8 0.019 0.044 0.09 0.21 0.50 1.1 2.5 5.4 12 31 68 S
Core i7 860 3460 3XF DC8 0.023 0.048 0.10 0.23 0.57 1.3 2.7 5.7 12 27 65 S
Core i7 930 3060 3XF DC7 0.017 0.035 0.08 0.18 0.45 1.0 2.1 4.6 10 23 58 S
Core i7 4820K 3900 3VF QC9 0.010 0.022 0.05 0.12 0.29 0.7 1.4 3.0 7 15 37 S
Duron 750 44F 133 0.11 0.23 0.48 1.4 5.9 15 36 81 201 475 1112
Athlon Tbird 1200 46F 133 0.072 0.15 0.30 0.77 3.3 11 35 77 176 402 932
Athlon 4 1794 46F DD3 0.050 0.10 0.21 0.56 2.0 6.7 19 41 90 203 478 S
Athlon 4 Bart 1800 47F#DD1 0.050 0.10 0.22 0.56 1.5 4.8 22 54 119 265 602 S
Turion 64 M 1900 47F DC4 0.052 0.10 0.21 0.50 1.4 3.4 10 23 52 112 252 S
Athlon XP 2080 46F DD2 0.043 0.089 0.19 0.48 1.6 5.0 13 29 65 150 364 S
Athlon 64a 2000 48F DD3 0.041 0.083 0.17 0.46 1.3 3.1 7.2 20 49 112 262 S
Opteron 2000 48F DD3 0.040 0.082 0.17 0.46 1.3 3.0 7.4 21 50 121 289 S
Athlon 64aa 2210 47F DC3 0.036 0.074 0.15 0.40 1.1 2.9 8.4 18 43 95 214 S
Phenom 3000 4ZF DC8 0.020 0.041 0.085 0.22 0.60 1.5 3.8 8.5 19 46 133 S
Double Precision Milliseconds
Pentium 200 16B 66 2.2 4.8 11 23 58 131 291 701 1509 3031 6274
Pentium MMX 200 27B 66 1.6 4.4 9.4 20 49 111 244 532 1133 2392
Pentium Pro 200 16F 66 1.0 2.2 4.9 17 37 88 193 410 881 1932
Pentium II 400 27H 100 0.41 1.6 3.5 7.7 21 54 128 285 620 1339 2914
Celeron A 450 25F 100 0.29 0.73 1.7 8.4 21 44 95 205 445 949 2037
Pentium IIIE 550 26F 100 0.23 0.53 1.1 2.8 11 25 57 127 276 604 1319
Celeron M 1295 38F 0.10 0.23 0.54 1.2 2.7 7.3 23 60 138 308 691 S
Pentium 4 1900 16F 133 0.091 0.19 0.40 1.0 4.9 14 32 69 147 322 755 S
Pentium 4N 2400 17F 133 0.071 0.15 0.31 0.70 3.0 11 31 68 147 315 712 S
Pentium M2 1862 39F DC1 0.070 0.16 0.38 0.81 1.9 4.0 10 26 66 151 331 S
Pentium 4N 2400 17F RD2 0.069 0.14 0.31 0.68 2.3 7.1 19 42 92 197 440 S
Pentium 4N 2533 17F RD3 0.067 0.14 0.30 0.66 2.0 5.8 15 34 74 158 354 S
Pentium 4N 2533 17F DC1 0.065 0.14 0.30 0.64 2.2 6.8 19 41 89 190 428 S
Pentium 4E 3000 28F DC3 0.058 0.13 0.28 0.60 1.5 3.7 11 24 51 114 255 S
Pentium 4N 3066 17F DD1 0.054 0.11 0.24 0.54 2.1 7.3 21 45 99 212 475 S
Pentium 4N 3678 17F DC3 0.046 0.10 0.21 0.44 1.4 4.2 11 25 54 115 256 S
Atom M 1600 H7F SCC 0.19 0.46 0.99 2.1 5.2 13 30 70 155 339 746 S
Core 2 Duo M 1830 39F DC5 0.042 0.094 0.23 0.50 1.1 2.5 6.3 17 42 89 190 S
Celeron C2 M 2000 38F DC4 0.040 0.087 0.21 0.46 1.1 2.8 9.1 22 49 106 232 S
Core2 Duo A1CP 2400 3AF DC4 0.031 0.070 0.17 0.38 0.87 1.9 4.1 13 50 154 362 S
Core2 Duo B1CP 2400 3AF DC4 0.031 0.071 0.18 0.38 0.87 1.9 4.1 10 28 70 158 S
Core i5 2467M 2300 3WF DC8 0.024 0.059 0.14 0.27 0.66 1.5 3.2 7.2 17 38 84 S
Core i7 860 3460 3XF DC8 0.028 0.062 0.14 0.30 0.76 1.7 3.6 7.6 17 40 88 S
Core i7 930 3060 3XF DC7 0.023 0.051 0.12 0.26 0.65 1.4 3.1 6.5 15 37 86 S
Core i7 4820K 3900 3VF QC9 0.013 0.030 0.07 0.15 0.40 0.9 1.9 4.0 9 22 55 S
Duron 750 44F 133 0.12 0.25 0.85 5.1 15 33 73 163 365 821 1872
Athlon Tbird 1200 46F 133 0.080 0.16 0.46 1.6 8.1 23 52 120 274 636 1505
Athlon 64a 2000 48F DD3 0.063 0.13 0.34 1.1 2.5 5.9 16 35 77 174 392 S
Opteron 2000 48F DD3 0.062 0.13 0.34 1.1 2.5 6.0 15 34 80 187 433 S
Athlon 64aa 2210 47F DC3 0.056 0.12 0.29 0.90 2.4 6.3 14 30 65 145 315 S
Athlon 4 1794 46F DD3 0.049 0.10 0.32 1.2 4.6 12 25 53 117 265 636
Athlon 4 Bart 1800 47F#DD1 0.049 0.10 0.31 1.2 4.2 15 36 79 172 373 872
Turion 64 M 1900 47F DC4 0.068 0.14 0.36 1.1 2.9 7.4 16 35 76 167 367 S
Athlon XP 2080 46F DD2 0.043 0.092 0.27 0.99 3.6 9.1 20 42 90 205 482
Phenom 3000 4ZF DC8 0.028 0.058 0.15 0.53 1.3 2.9 6.3 14 32 82 186 S
|
Go To Start
Linux Benchmarks
The Linux benchmarks were recompiled via Ubuntu 14.04 via GCC 4.8.2 that can handle later Intel CPU instructions, including AVX1 and results are included below. A 64 bit version of this Ubuntu was installed on an external USB 3 disk drive to work on a PC that boots to UEFI mode. Another 64 bit version was installed on a USB 2 flash drive that can be used successfully on different PCs.
Then, a 32 bit version was used to compile 32 bit benchmarks.
In order to run the latter on 64 bit systems, 32 bit lib386 Shared Object files have to be installed. Numerous proposed methods of installing these are available on the web. The methods used for the USB disk drive installation handle all benchmarks tried so far, but the 64 bit flash drive lacks support for 32 bit OpenMP.
Some results included were compiled and run via older Ubuntu releases, including 32 bit versions.
Three of the benchmarks, including source code and compile commands, are in
memory_benchmarks.tar.gz,
with others in
AVX_benchmarks.tar.gz,
linux_openmp.tar.gz
and
FFTbenchmarks.zip.
Further details are provided below, including differences to the Windows benchmarks in some of the functions used.
Go To Start
MemSpeed memory_speed32, memory_speed64, memory_speed64AVX
The non-AVX tests were not included in the
memory_benchmark collection,
because of the inexplicable slow performance of the
Windows MemSpeed program.
At a later date, OpenMP based banchmarks were produced, including one for MemSpeed, also non-OpenMP normal versions by omitting OMP directives. These memory_speed32 and memory_speed64 benchmarks are in
linux_openmp.tar.gz
and those for memory_speed64AVX in
AVX_benchmarks.tar.gz.
These benchmarks are somewhat different to
the Windows version,
using all C functions instead of assembly code and the first set of tests comprising x[m]=x[m]+s*y[m] instead of s=s+x[m]*y[m], the former being equivalent to the performance dependent calculations in the
Linpack Benchmark.
The AVX version is produced from the same source code by simply including the -mavx parameter in the compile command. Note that running this benchmark on CPUs without AVX functions, leads to an illegal instruction indication.
The first calculations are of the following format, but with addition for the y[] calculations using integers. Due to an oversight, the sum variable was zero, as used in the earlier assembly code, and omitted by the compiler. The code then became the same as the second set of integer calculations.
for (m=0; m<kd; m=m+inc)
{
x[m] = x[m] + sum * y[m];
x[m+1] = x[m+1] + sum * y[m+1];
x[m+2] = x[m+2] + sum * y[m+2];
x[m+3] = x[m+3] + sum * y[m+3];
}
Integer actually x[m] = x[m] + y[m]; etc.
Below is an example log of all results using a Core i7 CPU. Note Int32 recorded speeds.
Core i7 4820K mainly running at 3.9 GHz using Turbo Boost
1600 MHz RAM over 4 channels, Windows 10
Memory Reading Speed Test 64 Bit Version 4.1 by Roy Longbottom
Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] x[m]=y[m]
KBytes Dble Sngl Int32 Dble Sngl Int32 Dble Sngl Int32
Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S
L1 4 35055 24094 50991 35054 24063 50855 28639 19434 28857
8 35561 24552 56045 35533 24531 56189 29986 20126 29989
16 35600 24756 59167 35651 24779 59100 30580 20485 30631
32 35649 24897 60694 35696 24882 60668 29004 20668 30922
L2 64 31928 24488 47659 33712 24869 47557 23979 20652 29566
128 31968 24523 47270 33755 24904 46989 23756 20673 29348
256 30341 24404 42477 31936 24817 42498 21201 19791 26384
L3 512 25170 23307 30347 26145 23920 30347 15217 15231 17256
1024 25136 23215 30197 26016 23823 30161 15206 15179 17216
2048 25110 23257 30095 26043 23853 30095 15157 15126 17309
4096 25127 23216 30017 25973 23768 29979 15137 15129 17234
8192 25030 23284 29765 25944 23858 29768 14975 14987 17024
16384 15624 15696 15474 15835 15948 15474 7769 7710 7660
32768 14474 14578 14423 14670 14782 14417 7283 7270 7223
65536 14734 14893 14683 14976 15047 14683 7401 7387 7342
R 131072 15054 15204 14955 15262 15376 14952 7519 7513 7448
262144 15224 15373 15032 15451 15560 15088 7599 7589 7525
524288 15312 15433 15163 15486 15638 15164 7631 7628 7558
1048576 15295 15459 15220 15562 15691 15202 7648 7644 7575
2097152 15374 15526 15231 15587 15723 15225 7655 7653 7583
4194304 15393 15544 15241 15588 15716 15232 7670 7660 7592
R=RAM
|
Go To Start
MemSpeed Comparisons Next Page
MemSpeed Comparisons
Here, maximum MFLOPS are provided for the first floating point tests, by dividing MB/second by 8 for double precision (DP) and by 4 for single precision (SP).
With 64 bit operation, SSE SIMD mulp and addp instructions are used for SP, with 4 words in 128 bit xmm registers, using SSE2 for DP, with 2 words in the registers. These provide up to 4 or 2 simultaneous calculations respectively, at least providing significant SP performance gains.
AVX 1 ymm registers have 256 bits , vmulp and vaddp instructions being compiled, potentially doubling SSE and SSE2 speeds, with 4 DP and 8 SP words. In this case, the compiler appears to have further unrolled the SP calculation loop from 4 to 8 x[] and y[] addresses. For the i7 results shown, AVX SP MFLOPS increased by more than 3.2 times.
Maximum Integer MIPS are not shown, but for the 32 bytes (8 words) read, assembly code instructions used were 12 at 32 bits and 7 at both 64 bits and AVX, where MB/second can be divided by 2.67 or 4.57.
Intel Core i7 3900 MHz
Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] x[m]=y[m]
KBytes Dble Sngl Int32 Dble Sngl Int32 Dble Sngl Int32
Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S
64 bit
L1 4 35055 24094 50991 35054 24063 50855 28639 19434 28857
L2 64 31928 24488 47659 33712 24869 47557 23979 20652 29566
L3 512 25170 23307 30347 26145 23920 30347 15217 15231 17256
RAM 1GB 15295 15459 15220 15562 15691 15202 7648 7644 7575
Max MFLOPS 4382 6024
32 bit
L1 4 34692 17689 17790 35225 17813 17729 28530 14966 14985
L2 64 31050 17302 17833 32912 17797 17826 23586 13465 13439
L3 512 25186 17162 17689 26020 17597 17689 15493 11042 11007
RAM 1GB 14925 13795 14008 15043 13962 14000 7429 7596 7596
Max MFLOPS 4337 4422
64 Bit AVX
L1 4 59966 56645 52617 57685 56546 57879 37587 37408 37239
L2 64 48203 40892 48473 49140 49052 48634 30648 30484 30511
L3 512 32499 31113 33002 33332 33388 33001 19044 19007 19036
RAM 1GB 14737 14973 14564 14552 14545 14585 7365 7363 7358
Max MFLOPS 7496 14161
AMD Phenom II 3000 MHz
64 bit
L1 4 25660 18401 30360 23110 20146 30376 22347 15111 15262
64 27372 19184 31675 24405 21049 31675 23940 15971 15932
512 17329 16351 20723 17300 16766 20722 10540 10616 10489
RAM 1GB 6386 6105 6420 6382 6282 6387 3250 3296 3230
Max MFLOPS 3208 4600
32 bit
L1 4 21508 11080 11610 22942 11245 11576 12599 11693 11606
64 22600 11317 11662 24020 11454 11616 12085 11974 11975
512 14511 9152 9410 14641 9496 9407 7824 6723 6712
RAM 1GB 6569 5962 6254 6325 5991 6259 3414 3223 3244
Max MFLOPS 2689 2770
64 Bit AVX Illegal instruction (no AVX instructions)
Intel Core 2 Duo 2400 MHz
64 bit
L1 4 15901 12391 10680 17787 12440 10680 18827 9222 6212
64 12237 11368 10424 12240 11049 10433 7883 7920 6380
512 12261 11379 10445 12262 11053 10444 7848 7905 6392
RAM 0.5GB 3421 3420 3387 3454 3426 3395 1788 1731 1761
Max MFLOPS 1988 3098
32 bit
L1 4 17321 8606 9464 19039 9441 9463 18888 9279 9276
64 11722 7628 7986 11344 7607 7988 8001 5369 5358
512 7987 5078 5326 7569 5102 5320 5363 3582 3576
RAM 0.5GB 3441 3356 3389 3621 3357 3388 1787 1727 1736
Max MFLOPS 2165 2052
64 Bit AVX Illegal instruction (no AVX instructions)
|
Go To Start
BusSpeed - busspeed32, busspeed64
Unlike
Windows BusSpeed Benchmark,
this one mainly uses up to 64 C AND statements, instead of assembly code. The exception is the 128bSSE2 test that comprises 64 pand assembly instructions. For this benchmark, the 64 bit version uses 64 bit integers.
The benchmarks and source code are in
memory_benchmarks.tar.gz.
Maximum MIPS speeds are provided for reading all data into integer and SSE type registers. As with
The Windows Benchmark,
RAM speeds from these single core tests are nowhere near the specification, and multiple cores need to be use to approach this.
Intel Core i7 3900 MHz
Bus Speed Test 64 bit Version 2.0 Thu Sep 28 17:16:54 2017 #64 bit Integers
Kbytes Inc32wds Inc16wds Inc8wds Inc4wds Inc2wds ReadAll 128bSSE2
L1 6 31239 31266 31259 42217 38193 42603 61440
24 31291 31277 31274 41804 39352 42873 62262
L2 96 12301 12620 12598 21826 30551 39152 57284
384 5508 5565 5700 11234 20684 34185 42054
768 5304 5392 5504 10811 19331 33475 38119
L3 1536 5273 5371 5503 10815 19411 33663 38265
RAM 16380 1284 1566 2174 4770 9130 18560 19152
131070 1225 1486 2099 4549 8741 18116 18376
393210 1225 1486 2098 4548 8738 18136 18354
Max MIPS 5325 7680#
32 bit
6 15320 15478 20506 18275 20314 21357 60664
96 7476 7624 11483 16509 20036 21095 60585
1536 2692 2763 5393 9657 16704 21006 38151
393210 744 1049 2252 4365 9073 16411 18322
Max MIPS 5339 15166
AMD Phenom II 3000 MHz
64 bit
Kbytes Inc32wds Inc16wds Inc8wds Inc4wds Inc2wds ReadAll 128bSSE2
6 21389 22715 26255 27025 27023 26405 23754
96 2988 2970 2989 5984 11764 20688 23792
1536 1294 1294 1294 2584 5176 10227 10350
393210 930 933 939 1856 3840 6620 7695
Max MIPS 3301 2969#
32 bit
6 11117 12846 13522 13682 13464 13337 23747
96 1494 1496 2984 5879 10598 12484 23877
1536 646 646 1291 2586 5108 9128 10334
393210 464 470 931 1917 3295 5497 7605
Max MIPS 3334 5937
Intel Core 2 Duo 2400 MHz
64 bit
6 10623 11662 12091 12350 12469 12763 24901
96 2768 2768 2732 4471 6076 8959 12758
1536 2783 2782 2742 4472 6079 8952 12804
393210 640 641 745 1499 2619 4915 5122
Max MIPS 1595 3113#
32 bit
6 8568 9064 9171 9314 9405 9429 24883
96 1383 1366 2182 3038 4474 5394 12728
1536 1470 1359 2176 3032 4479 5396 12785
393210 321 372 747 1317 2467 4273 5110
Max MIPS 2357 6221
|
Go To Start
RandMem - randmem32, randmem64
These benchmarks are compiled with identical C code calculations as
the Windows version.
They use the same format of complex integer based indexing for serial and random reading and writing, with final data transferred being either 32 bit integer or 64 bit double precision floating point numbers. The benchmarks and source code are in
memory_benchmarks.tar.gz.
Measured MB/second speeds are often effectively the same as
the Windows version
and between 32 bit and 64 bit compilations.
Intel Core i7 3900 MHz
Random/Serial Memory Test 64 Bit Version 2 Thu Sep 28 17:19:24 2017
Integer....................... Double/Integer................
Serial........ Random........ Serial........ Random........
RAM Read Rd/Wrt Read Rd/Wrt Read Rd/Wrt Read Rd/Wrt
KB MB/Sec MB/Sec MB/Sec MB/Sec MB/Sec MB/Sec MB/Sec MB/Sec
L1 6 26897 28366 26501 25786 30253 43467 30413 43903
12 26968 28832 26380 27136 29908 43492 29908 43009
24 27070 29211 26533 28203 29833 43659 29846 42796
48 23201 23715 18765 12802 29685 33930 29673 30590
L2 96 23236 23770 13926 8923 29764 34102 22945 14939
192 22993 21952 9837 6748 29257 32009 18268 12080
L3 384 22393 18694 8006 5842 28141 25442 14411 9830
768 22292 18045 6049 4976 27842 23271 10213 8064
1536 22322 18046 5414 4581 27854 23306 8808 7312
3072 21970 17466 3246 3144 27501 23175 8219 6897
6144 22439 18121 5052 4282 27419 23172 3477 3282
R 12288 15218 12304 2523 2694 20306 16075 4143 4373
24576 13920 11179 1333 1338 18171 13831 2317 2394
49152 14004 11264 1071 1062 17806 13683 1756 1784
98304 14083 11308 973 865 18624 13836 1589 1559
196608 14060 11234 930 685 18625 13840 1495 1177
393216 14073 11325 910 624 18609 13836 1452 992
786432 14093 11366 901 603 18567 13695 1433 935
1572864 13966 11357 892 614 18651 13840 1422 923
R=RAM
32 bit
6 24707 28684 24260 28013 29137 42429 29249 43078
96 22416 23796 13381 8764 29711 33703 23322 14583
3072 21188 17452 3239 3145 26582 23052 8270 6913
98304 13835 11296 970 899 18504 13735 1590 1551
AMD Phenom II 3000 MHz
64 bit
6 12535 9131 12630 9061 16804 13612 16819 13611
96 11973 8457 6859 5215 16923 11770 16495 11879
3072 6118 7092 1202 1165 9527 9192 2045 2040
98304 4367 3628 639 586 7131 5904 1077 959
32 bit
6 13435 11393 12822 11139 16736 20256 16787 19821
96 11481 10024 6903 5507 16967 16169 16542 14603
3072 7718 7388 1072 1047 9433 9043 2048 2043
98304 4423 3653 651 594 7081 5744 1079 944
Intel Core 2 Duo 2400 MHz
64 bit
6 9150 12202 9152 5156 13712 16195 13714 15619
96 8010 9497 4112 3702 11341 11886 7376 6420
3072 7799 9287 2835 2598 10811 10442 3725 3357
98304 3337 2345 471 345 4671 2766 711 561
32 bit
6 8586 12171 8576 6574 13635 18131 13634 18092
96 7620 9425 4015 3735 11355 12085 7371 6441
3072 5050 6122 1931 1784 7303 6878 2521 2232
98304 3858 2056 436 334 4990 2763 706 560
|
Go To Start
SSEfpu - ssefpu32, ssefpu64
This is a variation of the
SSE3DNow Benchmark,
with extensions but excluding AMD 3DNow tests. The benchmark measures Single Precision (SP) and Double Precision (DP) Floating Point speeds, data streaming from caches and RAM. It uses SSE (SP) and SSE2 (DP) assembly code instructions, along with compiled C code that produces the old x87 instructions at 32 bits and SSE type for working on a 64 bit system. The additional tests avoid intermediate register to register operations using s=(s+x[m])*y[m] and s=s+x[m]+y[m], to produce much faster speeds. The former leads to linked multiply and add operation that can produce up to eight floating point operations per clock cycle, or 31.2 GFLOPS on the Core i7 reported on below, with the appropriate test achieving up to a respectable 25 GFLOPS.
Note that results from 64 bit and 32 bit compilations can be virtually the same. This could be expected for SSE tests, as they use the same SSE assembly code instructions.
Even the integer test results can be similar, with the 32 bit version compiled to use the old i87 floating point instructions and SSE instructions at 64 bits, but limited to scalar operation, dealing with only one of the four SSE register compartments. SSE performance is also similar to that from the
Windows Benchmark,
but results completely different for compiled integer tests (old compilation folder not available to investigate).
The benchmarks and source code are in
memory_benchmarks.tar.gz.
Intel Core i7 3900 MHz
SSE & SSE2 Memory Reading Speed Test 64-Bit Version 2.1
Memory --s=s+x[m]*y[m]--- --x[m]=x[m]+y[m]-- (s+x[m])?y[m]
KBytes SSE2 SSE Sngl SSE2 SSE Sngl +*SSE ++SSE
Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S
L1 4 41006 41014 10752 78678 75044 28425 93492 61309
8 41329 41332 10585 78495 78680 27592 99656 61421
16 41485 41485 10501 80823 80935 27681 100245 60960
32 41562 41550 10459 81545 81550 27726 93422 60961
L2 64 41482 41442 10437 50270 50047 27208 56854 57013
128 41516 41524 10428 49254 49178 27219 56004 56140
256 40293 40326 10423 46558 46549 26748 48312 48513
L3 512 37261 37298 10418 32513 32531 24421 39719 39780
1024 36790 36813 10414 31430 31425 24132 38698 38793
2048 36880 36904 10418 31394 31400 24202 38839 38906
4096 36931 36929 10415 31399 31381 24271 38891 38958
8192 36791 36873 10416 31254 31306 24281 38765 38790
16384 21227 21228 9540 15121 15124 15659 20817 20834
32768 21407 21377 9560 14777 14762 15431 20967 20951
65536 21831 21843 9576 14980 14981 15592 21380 21383
R 131072 22093 22104 9585 14980 14985 15600 21611 21649
262144 22310 22297 9586 14986 15037 15675 21782 21792
524288 22431 22530 9581 15054 15039 15682 21932 21931
1048576 22591 22604 9590 15040 15055 15692 22035 22026
2097152 22629 22634 9587 15059 15062 15700 22120 22108
4194304 21864 21868 9573 14881 14873 15461 21372 21407
R=RAM
SSE2 SSE Norm SSE2 SSE Norm SSE SSE
Maximum DP SP SP DP SP SP SP SP
MFLOPS 5195 10388 2688 5097 10194 3553 25061 15355
32 bit
L1 4 40984 41012 10755 79355 79200 21372 90456 61235
64 41499 41546 10440 49718 49820 18195 57058 57309
512 35927 35840 10415 30957 30986 16915 37994 38140
RAM 1GB 20978 20953 10081 14733 14734 12392 20633 20624
AMD Phenom II 3000 MHz
64 bit
L1 4 22720 22649 6141 43355 43377 23298 66228 41175
64 23878 23878 6017 44716 45514 23782 85916 46752
512 20095 20048 6000 18630 18629 16662 20036 20018
RAM 1GB 8163 8260 5395 6754 6794 6757 8046 7939
32 bit
L1 4 22723 22686 6128 43666 41471 11794 66231 41868
64 23841 23864 6018 42659 39727 11638 86456 46784
512 17425 17335 5991 16456 16441 9528 17529 17536
RAM 1GB 8511 8519 5484 6921 6915 6199 8295 8256
Intel Core 2 Duo 2400 MHz
64 bit
L1 4 25197 25195 6601 36943 36943 13349 34725 34993
64 18093 18606 6400 17062 17062 12685 19620 19639
512 18343 18736 6396 17125 17128 12703 19793 19809
RAM 0.5GB 5712 5756 3951 3628 3501 3391 5676 5731
32 bit
L1 4 25193 25195 6603 37082 37081 9869 35725 35222
64 11904 11846 4261 11227 11228 5039 12454 12540
512 11927 11887 4261 11261 11261 5071 12586 12446
RAM 0.5GB 5727 5741 3956 3471 3499 3310 5668 5704
|
Go To Start
FFT Benchmarks - FFT1, FFT3c
The benchmarks and source code are in
fftgraf.zip,
that also contains benchmarks using the same format for Windows, Raspberry Pi and Android.
An example of logged results is below and these can be compared with
Windows Results above,
for the same Core i7 system that produced slightly different performance, but identical numerical checks.
As a reminder these benchmarks are all C code, with FFT1, being the original program and FFT3c, the third optimised one with rearranged C statements, instead of assembly code.
Detailed results are below, along with comparisons that demonstrate performance gains of single vs double precision, 64 vs 32 bit and FFT3c vs FFT1.
These demonstrate the variability of gains between different processors and FFT sizes.
FFT 64 Bit Benchmark Version 3c.0 Thu Sep 3 10:32:25 2015
Size milliseconds
K Single Precision Double Precision
1 0.019 0.012 0.012 0.016 0.016 0.016
2 0.029 0.026 0.026 0.035 0.035 0.035
4 0.063 0.058 0.058 0.079 0.079 0.079
8 0.147 0.136 0.136 0.177 0.176 0.176
16 0.333 0.315 0.314 0.365 0.364 0.364
32 0.710 0.683 0.687 0.783 0.785 0.784
64 1.521 1.467 1.469 1.696 1.699 1.693
128 3.285 3.186 3.181 3.639 3.633 3.637
256 7.303 6.950 6.947 8.140 8.088 8.145
512 15.859 15.442 15.437 21.008 21.054 21.187
1024 38.551 37.789 37.776 65.300 65.009 65.388
1024 Square Check Maximum Noise Average Noise
SP 9.999520e-01 3.346482e-06 4.565234e-11
DP 1.000000e+00 1.133294e-23 1.428110e-28
Cache FFT Size K ---> Results in milliseconds
Processor MHz & RAM 1 2 4 8 16 32 64 128 256 512 1024
FFT1 SP 64 bit
Core 2 Duo 2400 3AF DC4 0.037 0.09 0.22 0.64 1.48 3.4 7.7 17.0 37 96 587
Phenom 3000 4ZF DC8 0.026 0.06 0.14 0.34 1.55 4.0 9.8 27.6 65 151 549
Core i7 4820K 3900 3VF QC9 0.014 0.03 0.07 0.22 0.56 1.4 3.9 9.1 21 49 111
FFT1 DP 64 bit
Core 2 Duo 2400 3AF DC4 0.044 0.11 0.30 0.69 1.57 3.5 7.6 16.5 47 317 763
Phenom 3000 4ZF DC8 0.031 0.07 0.17 0.80 2.04 5.0 14.0 32.7 76 283 712
Core i7 4820K 3900 3VF QC9 0.016 0.04 0.11 0.27 0.67 1.9 4.5 10.6 24 55 234
FFT3c SP 64 bit
Core 2 Duo 2400 3AF DC4 0.029 0.07 0.17 0.41 0.92 2.0 4.3 9.4 21 52 141
Phenom 3000 4ZF DC8 0.021 0.05 0.10 0.26 0.66 1.6 4.0 9.3 21 53 153
Core i7 4820K 3900 3VF QC9 0.012 0.03 0.06 0.14 0.31 0.7 1.5 3.2 6.9 15 38
FFT3c DP 64 bit
Core 2 Duo 2400 3AF DC4 0.054 0.12 0.29 0.63 1.28 2.8 6.1 14.1 34 85 195
Phenom 3000 4ZF DC8 0.026 0.05 0.14 0.40 0.87 2.1 4.6 10.7 27 78 96
Core i7 4820K 3900 3VF QC9 0.016 0.04 0.08 0.18 0.36 0.8 1.7 3.6 8.1 21 65
FFT1 SP 32 bit
Core 2 Duo 2400 3AF DC4 0.038 0.09 0.23 0.65 1.56 3.6 8.2 18.1 39 108 441
Phenom 3000 4ZF DC8 0.029 0.07 0.19 0.35 1.59 4.0 9.8 27.7 65 150 535
Core i7 4820K 3900 3VF QC9 0.018 0.04 0.09 0.26 0.64 1.6 4.5 10.3 23 53 118
FFT1 DP 32 bit
Core 2 Duo 2400 3AF DC4 0.043 0.11 0.30 0.73 1.68 3.8 8.6 19.0 58 247 624
Phenom 3000 4ZF DC8 0.029 0.09 0.24 0.81 2.04 4.9 13.9 32.3 75 282 711
Core i7 4820K 3900 3VF QC9 0.018 0.04 0.12 0.30 0.74 2.2 5.0 11.4 26 60 296
FFT3c SP 32 bit
Core 2 Duo 2400 3AF DC4 0.033 0.08 0.18 0.43 0.95 2.1 4.6 10.0 23 54 127
Phenom 3000 4ZF DC8 0.028 0.06 0.13 0.31 0.77 1.8 4.4 10.0 23 55 157
Core i7 4820K 3900 3VF QC9 0.015 0.03 0.07 0.17 0.38 0.8 1.8 3.9 8 19 46
FFT3c DP 32 bit
Core 2 Duo 2400 3AF DC4 0.034 0.08 0.19 0.42 1.04 2.3 4.8 11.3 28 69 155
Phenom 3000 4ZF DC8 0.026 0.05 0.14 0.40 0.95 2.2 4.9 11.4 29 81 207
Core i7 4820K 3900 3VF QC9 0.015 0.03 0.08 0.17 0.43 0.9 2.0 4.2 10 25 78
Performance Gains
FFT Size K ---> 1 2 4 8 16 32 64 128 256 512 1024
64 bit SP/DP
Core 2 Duo 1.86 1.77 1.71 1.54 1.39 1.43 1.43 1.50 1.60 1.64 1.39
Phenom 1.24 1.20 1.42 1.55 1.32 1.31 1.15 1.15 1.26 1.48 0.62
Core i7 4820K 1.33 1.35 1.36 1.29 1.16 1.15 1.15 1.14 1.16 1.36 1.72
64 bit/32 bit SP
Core 2 Duo 1.14 1.09 1.08 1.05 1.03 1.06 1.06 1.06 1.06 1.04 0.90
Phenom 1.33 1.33 1.32 1.19 1.17 1.14 1.09 1.08 1.06 1.04 1.02
Core i7 4820K 1.25 1.27 1.24 1.21 1.22 1.22 1.23 1.22 1.20 1.22 1.20
64 bit SP FFT3/FFT1
Core 2 Duo 1.28 1.30 1.33 1.56 1.61 1.72 1.77 1.80 1.72 1.84 4.17
Phenom 1.24 1.36 1.46 1.31 2.34 2.51 2.45 2.97 3.02 2.84 3.58
Core i7 4820K 1.17 1.23 1.28 1.63 1.79 2.04 2.68 2.87 3.09 3.20 2.95
|
Go To Start
|